[Gmod-tripal-devel] GBrowse double features

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Gmod-tripal-devel] GBrowse double features

Scott Cain
Hi all,

No, it's not about going to a movie theater and seeing movies about genome browsers :-)

I've been working with Sofia Robb in her efforts to get a Tripal/WebApollo/GBrowse instance running.  While she's managed most things quite well, she and I have run into a problem with regards to how GBrowse interacts with Chado after analysis results have been loaded through Tripal.  What happens is when she loads a GBrowse page, she gets a gene with two "mRNAs" where the gene really only has one.  I believe this is because when a BLAST analysis is loaded, an entry is put in the analysisfeature table and associated with the gene feature itself, rather than with a match feature that is associated both with the gene and the thing the blast hit.  Is that what happens?  The reason this is causing a problem is this query that GBrowse uses to get features:

select distinct name,fl.fmin,fl.fmax,fl.strand,fl.phase,
       fl.locgroup,fl.srcfeature_id,f.type_id,
       f.uniquename,f.feature_id, 
       af.significance as score,
       fd.dbxref_id,f.is_obsolete
from (feature f join featureloc fl 
              ON (f.feature_id = fl.feature_id))
      left join feature_dbxref fd 
              ON (f.feature_id = fd.feature_id 
                 AND fd.dbxref_id in 
             (select dbxref_id from dbxref where db_id=2))
      left join analysisfeature af 
              ON (f.feature_id = af.feature_id)
where f.feature_id = 430504 
  and fl.rank=0  
  and fl.srcfeature_id = 31904
order by f.type_id,fl.fmin

The left join with analysisfeature causes there to be two lines in the result even though there is only one gene: one line with the score (that actually came from the original import--it's the MAKER score for the gene), and one for which there is no score, which corresponds to the BLAST result import.

It's possible that we did something wrong when importing data that caused this, and I just wanted make sure we understand what's happening, and what should be happening if we're doing something wrong.

Thanks much,
Scott

--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] GBrowse double features

Stephen Ficklin-2
Hi Scott,

Yes, I see the problem.....   It sounds as though the analysisfeature table was meant to associate information about the primary analysis for a feature itself (i.e. details about its creation), but we are using it to associate results from secondary analyses where the feature is used in down stream analyses.  The blast importer doesn't actually create match features. Basically the XML of the blast results are stored in the analysisfeatureprop table, and then quickly parsed for display when someone views the feature page.   The InterPro module behaves the same way, so if Sofia were to load InterPro data she would probably have three mRNAs :-( 

We've never really hit on this problem at our end because we use an external database for GBrowse and re-import the GFF into that.  But, I do think Tripal shouldn't screw up GBrowse.  

We will be creating web services for exporting data in Chado with other Tripal sites (and other uses), so it's important that we get the data stored in a way that is expected. We can redesign the blast module to import differently.  Maybe storing this data in the analysis tables isn't the route to go.... Or perhaps an extra where clause can be added to the GBrowse query?  I remember you did something similar to that with GBrowse before because of the way Tripal imported something (can't remember what it was now).  What do you think would be the best long-term solution?

Stephen

On 10/21/2014 2:23 PM, Scott Cain wrote:
Hi all,

No, it's not about going to a movie theater and seeing movies about genome browsers :-)

I've been working with Sofia Robb in her efforts to get a Tripal/WebApollo/GBrowse instance running.  While she's managed most things quite well, she and I have run into a problem with regards to how GBrowse interacts with Chado after analysis results have been loaded through Tripal.  What happens is when she loads a GBrowse page, she gets a gene with two "mRNAs" where the gene really only has one.  I believe this is because when a BLAST analysis is loaded, an entry is put in the analysisfeature table and associated with the gene feature itself, rather than with a match feature that is associated both with the gene and the thing the blast hit.  Is that what happens?  The reason this is causing a problem is this query that GBrowse uses to get features:

select distinct name,fl.fmin,fl.fmax,fl.strand,fl.phase,
       fl.locgroup,fl.srcfeature_id,f.type_id,
       f.uniquename,f.feature_id, 
       af.significance as score,
       fd.dbxref_id,f.is_obsolete
from (feature f join featureloc fl 
              ON (f.feature_id = fl.feature_id))
      left join feature_dbxref fd 
              ON (f.feature_id = fd.feature_id 
                 AND fd.dbxref_id in 
             (select dbxref_id from dbxref where db_id=2))
      left join analysisfeature af 
              ON (f.feature_id = af.feature_id)
where f.feature_id = 430504 
  and fl.rank=0  
  and fl.srcfeature_id = 31904
order by f.type_id,fl.fmin

The left join with analysisfeature causes there to be two lines in the result even though there is only one gene: one line with the score (that actually came from the original import--it's the MAKER score for the gene), and one for which there is no score, which corresponds to the BLAST result import.

It's possible that we did something wrong when importing data that caused this, and I just wanted make sure we understand what's happening, and what should be happening if we're doing something wrong.

Thanks much,
Scott

--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho


_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel