[Gmod-tripal-devel] GBrowse/GFF loader incompatibility

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Gmod-tripal-devel] GBrowse/GFF loader incompatibility

Scott Cain
Hi Stephen,

I just wanted to let you know that there is some incompatibility
between the Tripal GFF3 loader and the current GBrowse Chado adaptor.
I haven't had a chance to debug it yet, so I can't even tell you what
is going wrong, but I'd like to suggest holding off on a release until
I get a chance to figure it out.  What is happening is that when I
look at the P. ultimum data in GBrowse I see 3 to 5 transcripts for
each gene, even though there really is only one (and, there is only
one in database).  It's really quite strange, but given the
strangeness, I'm hoping it will be easy to debug.

Thanks,
Scott


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] GBrowse/GFF loader incompatibility

Stephen Ficklin-2
Okay,  Thanks Scott for doing that test!   I'll fix anything you find
before the v1.0 release.

Stephen

On 8/27/2012 6:38 PM, Scott Cain wrote:

> Hi Stephen,
>
> I just wanted to let you know that there is some incompatibility
> between the Tripal GFF3 loader and the current GBrowse Chado adaptor.
> I haven't had a chance to debug it yet, so I can't even tell you what
> is going wrong, but I'd like to suggest holding off on a release until
> I get a chance to figure it out.  What is happening is that when I
> look at the P. ultimum data in GBrowse I see 3 to 5 transcripts for
> each gene, even though there really is only one (and, there is only
> one in database).  It's really quite strange, but given the
> strangeness, I'm hoping it will be easy to debug.
>
> Thanks,
> Scott
>
>


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] GBrowse/GFF loader incompatibility

Scott Cain
Hi Stephen,

Oy.  I hope this isn't going to be too big of a problem.  I went back to summer school final AMI and tried to figure out why I was getting multiple mRNAs in GBrowse when there was only one on the database, and I found the problem.  The query to find subfeatures in the GBrowse Chado adaptor includes a left join on analysisfeature, to get any score that is associated with the feature.  In the database that we used with Tripal, there are multiple entries in the analysisfeature table for each mRNA, which causes the subfeature query to return multiple rows for each mRNA, so GBrowse thinks there are multiple child features for each gene.

I contend that there should only be a one to one relationship between feature and analysisfeature (though I realize this isn't state explicitly anywhere). It makes sense that for the above mentioned mRNAs, that there be an entry in the analysisfeature table.  Tripal associates every GFF upload with an analysis, which, while contrary to typical usage, is not insane, and in any event, these mRNAs are the result of a computational analysis (MAKER) and might have a score associated with them (they don't, but the exons do).  But what I think is a problem is that subsequent analyses (like BLAST and InterPro scan) have their scores associated with these mRNAs as well.  They should not (at least, not directly): those scores should be associated with separate match/match_part features which have the mRNA as a srcfeature.  This is fairly important: for example, with a BLAST result, the match that has the score associated with it is typically not for the entire extent of the mRNA and we need the match/match_part feature to define exactly where the match is; the same is true for other analyses as well.

Please let me know if this doesn't make sense, or you think I've got the Chado usage wrong, or if there is anything I can do to help make this better.

Thanks much,
Scott

PS: In the specific example of the one gene I was looking at, I could fix this in the GBrowse Chado adaptor by adding a "distinct" to the query, but that only worked because there were no scores for any of the analyses, which points to another potential problem, and regardless, "distinct" isn't a sure-fire fix.


On Mon, Aug 27, 2012 at 9:00 PM, Stephen Ficklin <[hidden email]> wrote:
Okay,  Thanks Scott for doing that test!   I'll fix anything you find before the v1.0 release.

Stephen


On 8/27/2012 6:38 PM, Scott Cain wrote:
Hi Stephen,

I just wanted to let you know that there is some incompatibility
between the Tripal GFF3 loader and the current GBrowse Chado adaptor.
I haven't had a chance to debug it yet, so I can't even tell you what
is going wrong, but I'd like to suggest holding off on a release until
I get a chance to figure it out.  What is happening is that when I
look at the P. ultimum data in GBrowse I see 3 to 5 transcripts for
each gene, even though there really is only one (and, there is only
one in database).  It's really quite strange, but given the
strangeness, I'm hoping it will be easy to debug.

Thanks,
Scott






--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] GBrowse/GFF loader incompatibility

Stephen Ficklin-2
Hi Scott,

Thanks for looking at this!  Sorry for my slow reply. I've been quite busy.

Unfortunately, the idea that a feature can be associated with multiple analyses in which it was used (or also derived) is all over the place with Tripal.  It's coded as part of modules and is used to help associate all kinds of results (blast, kegg, interpro, etc) with a single feature.    When we store analysis results it's not just the numeric fields in the featureanalysis table that we store. 

For Blast and InterPro analyses we do not create new 'match' and 'match_part' features.  But rather we store XML results to be parsed and viewed online on a gene page.  (Although we should create the 'match' and 'match_part' features!)  We need to be able to associate these results with both the feature they belong to and the analysis that generated them.  

I'm not sure how to handle this problem :-(   Rather than do a left join, could you do a second query that gets the analysis with the lowest analysis_id as that should be the analysis that the feature originally belongs to.   Just an idea....

Stephen


On 9/20/2012 12:07 PM, Scott Cain wrote:
Hi Stephen,

Oy.  I hope this isn't going to be too big of a problem.  I went back to summer school final AMI and tried to figure out why I was getting multiple mRNAs in GBrowse when there was only one on the database, and I found the problem.  The query to find subfeatures in the GBrowse Chado adaptor includes a left join on analysisfeature, to get any score that is associated with the feature.  In the database that we used with Tripal, there are multiple entries in the analysisfeature table for each mRNA, which causes the subfeature query to return multiple rows for each mRNA, so GBrowse thinks there are multiple child features for each gene.

I contend that there should only be a one to one relationship between feature and analysisfeature (though I realize this isn't state explicitly anywhere). It makes sense that for the above mentioned mRNAs, that there be an entry in the analysisfeature table.  Tripal associates every GFF upload with an analysis, which, while contrary to typical usage, is not insane, and in any event, these mRNAs are the result of a computational analysis (MAKER) and might have a score associated with them (they don't, but the exons do).  But what I think is a problem is that subsequent analyses (like BLAST and InterPro scan) have their scores associated with these mRNAs as well.  They should not (at least, not directly): those scores should be associated with separate match/match_part features which have the mRNA as a srcfeature.  This is fairly important: for example, with a BLAST result, the match that has the score associated with it is typically not for the entire extent of the mRNA and we need the match/match_part feature to define exactly where the match is; the same is true for other analyses as well.

Please let me know if this doesn't make sense, or you think I've got the Chado usage wrong, or if there is anything I can do to help make this better.

Thanks much,
Scott

PS: In the specific example of the one gene I was looking at, I could fix this in the GBrowse Chado adaptor by adding a "distinct" to the query, but that only worked because there were no scores for any of the analyses, which points to another potential problem, and regardless, "distinct" isn't a sure-fire fix.


On Mon, Aug 27, 2012 at 9:00 PM, Stephen Ficklin <[hidden email]> wrote:
Okay,  Thanks Scott for doing that test!   I'll fix anything you find before the v1.0 release.

Stephen


On 8/27/2012 6:38 PM, Scott Cain wrote:
Hi Stephen,

I just wanted to let you know that there is some incompatibility
between the Tripal GFF3 loader and the current GBrowse Chado adaptor.
I haven't had a chance to debug it yet, so I can't even tell you what
is going wrong, but I'd like to suggest holding off on a release until
I get a chance to figure it out.  What is happening is that when I
look at the P. ultimum data in GBrowse I see 3 to 5 transcripts for
each gene, even though there really is only one (and, there is only
one in database).  It's really quite strange, but given the
strangeness, I'm hoping it will be easy to debug.

Thanks,
Scott






--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] GBrowse/GFF loader incompatibility

Scott Cain
Hi Stephen,

Don't worry about taking a while to get back; I think if the roles
were reversed, it would have taken me even longer :-)

It is somewhat unfortunate that analysis results don't get stored in a
more standard manner, but I understand why you went that way.  I'm not
sure what it means long term.  For example, if Tripal could deal with
analysis results that have been converted to GFF and loaded either
through Tripal or the command line in some way that is similar to what
it does now would probably be desired.

For the short term, the solution is fairly straight forward: GBrowse
won't be able to see the analysis results (since there are no features
to tie them to), and I can add an option so that GBrowse will know
it's dealing with with Tripal Chado database, and so won't go looking
for scores.

Thanks,
Scott


On Wed, Sep 26, 2012 at 4:34 PM, Stephen Ficklin <[hidden email]> wrote:

> Hi Scott,
>
> Thanks for looking at this!  Sorry for my slow reply. I've been quite busy.
>
> Unfortunately, the idea that a feature can be associated with multiple
> analyses in which it was used (or also derived) is all over the place with
> Tripal.  It's coded as part of modules and is used to help associate all
> kinds of results (blast, kegg, interpro, etc) with a single feature.    When
> we store analysis results it's not just the numeric fields in the
> featureanalysis table that we store.
>
> For Blast and InterPro analyses we do not create new 'match' and
> 'match_part' features.  But rather we store XML results to be parsed and
> viewed online on a gene page.  (Although we should create the 'match' and
> 'match_part' features!)  We need to be able to associate these results with
> both the feature they belong to and the analysis that generated them.
>
> I'm not sure how to handle this problem :-(   Rather than do a left join,
> could you do a second query that gets the analysis with the lowest
> analysis_id as that should be the analysis that the feature originally
> belongs to.   Just an idea....
>
> Stephen
>
>
>
> On 9/20/2012 12:07 PM, Scott Cain wrote:
>
> Hi Stephen,
>
> Oy.  I hope this isn't going to be too big of a problem.  I went back to
> summer school final AMI and tried to figure out why I was getting multiple
> mRNAs in GBrowse when there was only one on the database, and I found the
> problem.  The query to find subfeatures in the GBrowse Chado adaptor
> includes a left join on analysisfeature, to get any score that is associated
> with the feature.  In the database that we used with Tripal, there are
> multiple entries in the analysisfeature table for each mRNA, which causes
> the subfeature query to return multiple rows for each mRNA, so GBrowse
> thinks there are multiple child features for each gene.
>
> I contend that there should only be a one to one relationship between
> feature and analysisfeature (though I realize this isn't state explicitly
> anywhere). It makes sense that for the above mentioned mRNAs, that there be
> an entry in the analysisfeature table.  Tripal associates every GFF upload
> with an analysis, which, while contrary to typical usage, is not insane, and
> in any event, these mRNAs are the result of a computational analysis (MAKER)
> and might have a score associated with them (they don't, but the exons do).
> But what I think is a problem is that subsequent analyses (like BLAST and
> InterPro scan) have their scores associated with these mRNAs as well.  They
> should not (at least, not directly): those scores should be associated with
> separate match/match_part features which have the mRNA as a srcfeature.
> This is fairly important: for example, with a BLAST result, the match that
> has the score associated with it is typically not for the entire extent of
> the mRNA and we need the match/match_part feature to define exactly where
> the match is; the same is true for other analyses as well.
>
> Please let me know if this doesn't make sense, or you think I've got the
> Chado usage wrong, or if there is anything I can do to help make this
> better.
>
> Thanks much,
> Scott
>
> PS: In the specific example of the one gene I was looking at, I could fix
> this in the GBrowse Chado adaptor by adding a "distinct" to the query, but
> that only worked because there were no scores for any of the analyses, which
> points to another potential problem, and regardless, "distinct" isn't a
> sure-fire fix.
>
>
> On Mon, Aug 27, 2012 at 9:00 PM, Stephen Ficklin <[hidden email]>
> wrote:
>>
>> Okay,  Thanks Scott for doing that test!   I'll fix anything you find
>> before the v1.0 release.
>>
>> Stephen
>>
>>
>> On 8/27/2012 6:38 PM, Scott Cain wrote:
>>>
>>> Hi Stephen,
>>>
>>> I just wanted to let you know that there is some incompatibility
>>> between the Tripal GFF3 loader and the current GBrowse Chado adaptor.
>>> I haven't had a chance to debug it yet, so I can't even tell you what
>>> is going wrong, but I'd like to suggest holding off on a release until
>>> I get a chance to figure it out.  What is happening is that when I
>>> look at the P. ultimum data in GBrowse I see 3 to 5 transcripts for
>>> each gene, even though there really is only one (and, there is only
>>> one in database).  It's really quite strange, but given the
>>> strangeness, I'm hoping it will be easy to debug.
>>>
>>> Thanks,
>>> Scott
>>>
>>>
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel