more standard manner, but I understand why you went that way. I'm not
sure what it means long term. For example, if Tripal could deal with
> Hi Scott,
>
> Thanks for looking at this! Sorry for my slow reply. I've been quite busy.
>
> Unfortunately, the idea that a feature can be associated with multiple
> analyses in which it was used (or also derived) is all over the place with
> Tripal. It's coded as part of modules and is used to help associate all
> kinds of results (blast, kegg, interpro, etc) with a single feature. When
> we store analysis results it's not just the numeric fields in the
> featureanalysis table that we store.
>
> For Blast and InterPro analyses we do not create new 'match' and
> 'match_part' features. But rather we store XML results to be parsed and
> viewed online on a gene page. (Although we should create the 'match' and
> 'match_part' features!) We need to be able to associate these results with
> both the feature they belong to and the analysis that generated them.
>
> I'm not sure how to handle this problem :-( Rather than do a left join,
> could you do a second query that gets the analysis with the lowest
> analysis_id as that should be the analysis that the feature originally
> belongs to. Just an idea....
>
> Stephen
>
>
>
> On 9/20/2012 12:07 PM, Scott Cain wrote:
>
> Hi Stephen,
>
> Oy. I hope this isn't going to be too big of a problem. I went back to
> summer school final AMI and tried to figure out why I was getting multiple
> mRNAs in GBrowse when there was only one on the database, and I found the
> problem. The query to find subfeatures in the GBrowse Chado adaptor
> includes a left join on analysisfeature, to get any score that is associated
> with the feature. In the database that we used with Tripal, there are
> multiple entries in the analysisfeature table for each mRNA, which causes
> the subfeature query to return multiple rows for each mRNA, so GBrowse
> thinks there are multiple child features for each gene.
>
> I contend that there should only be a one to one relationship between
> feature and analysisfeature (though I realize this isn't state explicitly
> anywhere). It makes sense that for the above mentioned mRNAs, that there be
> an entry in the analysisfeature table. Tripal associates every GFF upload
> with an analysis, which, while contrary to typical usage, is not insane, and
> in any event, these mRNAs are the result of a computational analysis (MAKER)
> and might have a score associated with them (they don't, but the exons do).
> But what I think is a problem is that subsequent analyses (like BLAST and
> InterPro scan) have their scores associated with these mRNAs as well. They
> should not (at least, not directly): those scores should be associated with
> separate match/match_part features which have the mRNA as a srcfeature.
> This is fairly important: for example, with a BLAST result, the match that
> has the score associated with it is typically not for the entire extent of
> the mRNA and we need the match/match_part feature to define exactly where
> the match is; the same is true for other analyses as well.
>
> Please let me know if this doesn't make sense, or you think I've got the
> Chado usage wrong, or if there is anything I can do to help make this
> better.
>
> Thanks much,
> Scott
>
> PS: In the specific example of the one gene I was looking at, I could fix
> this in the GBrowse Chado adaptor by adding a "distinct" to the query, but
> that only worked because there were no scores for any of the analyses, which
> points to another potential problem, and regardless, "distinct" isn't a
> sure-fire fix.
>
>
> On Mon, Aug 27, 2012 at 9:00 PM, Stephen Ficklin <
[hidden email]>
> wrote:
>>
>> Okay, Thanks Scott for doing that test! I'll fix anything you find
>> before the v1.0 release.
>>
>> Stephen
>>
>>
>> On 8/27/2012 6:38 PM, Scott Cain wrote:
>>>
>>> Hi Stephen,
>>>
>>> I just wanted to let you know that there is some incompatibility
>>> between the Tripal GFF3 loader and the current GBrowse Chado adaptor.
>>> I haven't had a chance to debug it yet, so I can't even tell you what
>>> is going wrong, but I'd like to suggest holding off on a release until
>>> I get a chance to figure it out. What is happening is that when I
>>> look at the P. ultimum data in GBrowse I see 3 to 5 transcripts for
>>> each gene, even though there really is only one (and, there is only
>>> one in database). It's really quite strange, but given the
>>> strangeness, I'm hoping it will be easy to debug.
>>>
>>> Thanks,
>>> Scott
>>>
>>>
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. scott at scottcain dot
> net
> GMOD Coordinator (
http://gmod.org/) 216-392-3087
> Ontario Institute for Cancer Research
>
>
Scott Cain, Ph. D. scott at scottcain dot net
Everyone hates slow websites. So do we.