GBROWSE/CHADO/PostgreSQL problem: source feature alias reports more features than it should

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

GBROWSE/CHADO/PostgreSQL problem: source feature alias reports more features than it should

lpritc@scri.ac.uk
Hi,

Using an unmodified installation of GBROWSE/CHADO (GMOD v1.1) and
Bio::DB::Das::Chado I'm finding some odd behaviour when using aliases of
source features in the Landmark or Region query field.

I have source features defined as below in GFF3 and uploaded using
gmod_bulk_load_gff3.pl:

##gff-version 3
supercont1.1    PI_T30-4_FINAL_CALLGENES_3    supercontig    1    6928287
.    .    .    ID=supercont1.1;Name=supercont1.1;Note=seq:7000000037415152
supercont1.1 of Phytophthora infestans;Alias=supercont1.1,supercontig1.1

For each supercontig named supercont1.n we have the alias supercontig1.n to
cater for accidental over-typing of the source feature.  Queries on feature
names resolve in GBROWSE to supercont1.n:start..end locations, as in
fig1.png (attached).  Changing this location to the alias
supercontig1.n:start..end produces output with features from several other
source features (e.g. fig2.png, also attached).

The same kind of thing happens when just querying on the
supercontig/supercont alias (see attached fig3.png and fig4.png) - though
the large number of features returned tends to result in a timeout (see
fig4).

It looks like it might be that the adaptor is not registering the alias
match to the source feature as a source feature, and so not filtering the
query appropriately, but I can't see where this is happening in the adaptor
code - can someone (Scott?) please point me to the appropriate place, or
suggest a fix?

Thanks,

L.


--
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:[hidden email]       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405



______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee.
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify [hidden email] quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________
------------------------------------------------------------------------------


_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

fig1.png (144K) Download Attachment
fig2.png (158K) Download Attachment
fig3.png (110K) Download Attachment
fig4.png (107K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GBROWSE/CHADO/PostgreSQL problem: source feature alias reports more features than it should

lpritc@scri.ac.uk
Hi,

On 28/05/2010 Friday, May 28, 09:52, "Leighton Pritchard"
<[hidden email]> wrote:

> cater for accidental over-typing of the source feature.  Queries on feature
> names resolve in GBROWSE to supercont1.n:start..end locations, as in fig1.png
> (attached).  Changing this location to the alias supercontig1.n:start..end
> produces output with features from several other source features (e.g.
> fig2.png, also attached).
>
> The same kind of thing happens when just querying on the supercontig/supercont
> alias (see attached fig3.png and fig4.png) - though the large number of
> features returned tends to result in a timeout (see fig4).
>
> It looks like it might be that the adaptor is not registering the alias match
> to the source feature as a source feature, and so not filtering the query
> appropriately, but I can't see where this is happening in the adaptor code -
> can someone (Scott?) please point me to the appropriate place, or suggest a
> fix?

I've made some progress.  The problem code appears to be in Segment.pm,
around line 1300:

    $srcfeature_id = $self->{srcfeature_id} if ref $self;
    if (!$srcfeature_id && defined($seq_id)) {
      #if the seq_id arg was passed in, we should only look on that feature
      my $srcfeature_query = "SELECT feature_id FROM feature where
lower(uniquename) = ? ";
      $srcfeature_query .= "and organism_id = ".$factory->organism_id
          if $factory->organism_id;
      my $srcf_query_handle= $factory->dbh->prepare($srcfeature_query);
      $srcf_query_handle->execute(lc($seq_id));
      ($srcfeature_id) = $srcf_query_handle->fetchrow_array;

$srcfeature_query was expecting an exact match in the feature.uniquename
field, but this would not include aliases for the source sequence.
Modifying the query to point to all_feature_names (or, in this case, to the
materialised view derived from it in our local implementation) enables
queries that use an alias for the source feature:

my $srcfeature_query = "SELECT feature_id FROM mv_all_feature_names where
lower(name) = ? ";

The obvious implication of this is that if the uniquename for the source
feature isn't really unique (for that organism), then we lose the additional
protection of only querying on the feature.uniquename column.  Also,
all_feature_names/any derivative of that will be larger, and will take
longer to query.  Is there a better solution than this that springs to mind.

On a personal note, this wee hack conflicts with some of my full-text search
code - it makes the query take much longer with the alias than with the
unique name for the source feature, and there's something to be said for
specifying a search by specific name...  I'll be revisiting that.

L.

--
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:[hidden email]       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee.
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify [hidden email] quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________

------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse