[Gmod-tripal-devel] Problems importing blast results via GFF

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Gmod-tripal-devel] Problems importing blast results via GFF

Michael Dondrup-3
Hi everyone, and excuse me for bothering again.

For reason explained in my earlier request I am trying to import blastp vs swissprot results as features via the GFF importer.
My GFF file looks like this now (it's mosly output from the script BioPerl script bp_search2gff.pl, target_organism added manually):

##gff-version 3
augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match   82      2676    1309    .       1       ID=sp|Q8CJG0.3|AGO2_MOUSE
augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match_part      82      2676    0.0     .       0       Parent=sp|Q8CJG0.3|AGO2_MOUSE;Target=sp|Q8CJG0.3|AGO2_MOUSE 3 860;target_organism=mus:musculus

What is going wrong here, the landmark is present in the database.

The import job fails with the follwing errors:

WD T_gff3_loader: Cannot find organism for target mus:musculus.         [warning]
WD tripal_core: chado_prepare: 'sel_feature_orun' statement already     [error]
WD tripal_core: chado_execute_prepared: wrong argument type supplied for[error]
'ins_featureloc_all' statement, field 2. Expected int but recieved ''
syslog() expects parameter 1 to be long, string given syslog.module:85  [warning]
WD tripal_core: tripal_core_chado_insert: Cannot insert record into     [error]

## this is my debug output of the featureloc, I guess the srcfeature_id is missing:

'featureloc': Array
(
    [feature_id] => 184995
    [srcfeature_id] =>
    [fmin] => 2
    [is_fmin_partial] => FALSE
    [fmax] => 4
    [is_fmax_partial] => FALSE
    [strand] => 0
    [residue_info] =>
    [locgroup] => 0
    [rank] => 1
)
WD T_gff3_loader: Failed to insert featureloc                           [warning]
Drush command terminated abnormally due to an unrecoverable error.      [error]


cheers
Michael

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel

signature.asc (465 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] Problems importing blast results via GFF

Stephen Ficklin-2
Hi Michael,

Is the 'mus musculus' organism present in your organism table?  Are the genus and species spelled correctly with proper case in your GFF 'target_organism' attribute?  If it does not exist, you can click the 'Create Target' checkbox which will automatically add the Target feature.   Have you already tried that and still get the error below?

Stephen

On 8/22/2013 7:23 AM, Michael Dondrup wrote:
Hi everyone, and excuse me for bothering again.

For reason explained in my earlier request I am trying to import blastp vs swissprot results as features via the GFF importer. 
My GFF file looks like this now (it's mosly output from the script BioPerl script bp_search2gff.pl, target_organism added manually):

##gff-version 3
augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match   82      2676    1309    .       1       ID=sp|Q8CJG0.3|AGO2_MOUSE
augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match_part      82      2676    0.0     .       0       Parent=sp|Q8CJG0.3|AGO2_MOUSE;Target=sp|Q8CJG0.3|AGO2_MOUSE 3 860;target_organism=mus:musculus

What is going wrong here, the landmark is present in the database.

The import job fails with the follwing errors:

WD T_gff3_loader: Cannot find organism for target mus:musculus.         [warning]
WD tripal_core: chado_prepare: 'sel_feature_orun' statement already     [error]
WD tripal_core: chado_execute_prepared: wrong argument type supplied for[error]
'ins_featureloc_all' statement, field 2. Expected int but recieved ''
syslog() expects parameter 1 to be long, string given syslog.module:85  [warning]
WD tripal_core: tripal_core_chado_insert: Cannot insert record into     [error]

## this is my debug output of the featureloc, I guess the srcfeature_id is missing:

'featureloc': Array
(
    [feature_id] => 184995
    [srcfeature_id] => 
    [fmin] => 2
    [is_fmin_partial] => FALSE
    [fmax] => 4
    [is_fmax_partial] => FALSE
    [strand] => 0
    [residue_info] => 
    [locgroup] => 0
    [rank] => 1
)
WD T_gff3_loader: Failed to insert featureloc                           [warning]
Drush command terminated abnormally due to an unrecoverable error.      [error]


cheers
Michael


------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel


------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] Problems importing blast results via GFF

Michael Dondrup-3
Hi Stephen,
no, the mus musculus is not present, however this job was submitted with create Target on. I have tried the same with target_organism set to
Drosophila:Melanogaster which is present and that worked.  Is it supposed to create the organism if it is not present? Then I might consider
not setting target_organism at all, because a blast run against NR would fill my database with the full taxonomy more or less.


Michael

On Aug 22, 2013, at 2:53 PM, Stephen Ficklin wrote:

> Hi Michael,
>
> Is the 'mus musculus' organism present in your organism table?  Are the genus and species spelled correctly with proper case in your GFF 'target_organism' attribute?  If it does not exist, you can click the 'Create Target' checkbox which will automatically add the Target feature.   Have you already tried that and still get the error below?
>
> Stephen
>
> On 8/22/2013 7:23 AM, Michael Dondrup wrote:
>> Hi everyone, and excuse me for bothering again.
>>
>> For reason explained in my earlier request I am trying to import blastp vs swissprot results as features via the GFF importer.
>> My GFF file looks like this now (it's mosly output from the script BioPerl script bp_search2gff.pl, target_organism added manually):
>>
>> ##gff-version 3
>> augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match   82      2676    1309    .       1       ID=sp|Q8CJG0.3|AGO2_MOUSE
>> augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match_part      82      2676    0.0     .       0       Parent=sp|Q8CJG0.3|AGO2_MOUSE;Target=sp|Q8CJG0.3|AGO2_MOUSE 3 860;target_organism=mus:musculus
>>
>> What is going wrong here, the landmark is present in the database.
>>
>> The import job fails with the follwing errors:
>>
>> WD T_gff3_loader: Cannot find organism for target mus:musculus.         [warning]
>> WD tripal_core: chado_prepare: 'sel_feature_orun' statement already     [error]
>> WD tripal_core: chado_execute_prepared: wrong argument type supplied for[error]
>> 'ins_featureloc_all' statement, field 2. Expected int but recieved ''
>> syslog() expects parameter 1 to be long, string given syslog.module:85  [warning]
>> WD tripal_core: tripal_core_chado_insert: Cannot insert record into     [error]
>>
>> ## this is my debug output of the featureloc, I guess the srcfeature_id is missing:
>>
>> 'featureloc': Array
>> (
>>     [feature_id] => 184995
>>     [srcfeature_id] =>
>>     [fmin] => 2
>>     [is_fmin_partial] => FALSE
>>     [fmax] => 4
>>     [is_fmax_partial] => FALSE
>>     [strand] => 0
>>     [residue_info] =>
>>     [locgroup] => 0
>>     [rank] => 1
>> )
>> WD T_gff3_loader: Failed to insert featureloc                           [warning]
>> Drush command terminated abnormally due to an unrecoverable error.      [error]
>>
>>
>> cheers
>> Michael
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Introducing Performance Central, a new site from SourceForge and
>> AppDynamics. Performance Central is your source for news, insights,
>> analysis and resources for efficient Application Performance Management.
>> Visit us today!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
>>
>>
>> _______________________________________________
>> Gmod-tripal-devel mailing list
>>
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
>
> ------------------------------------------------------------------------------
> Introducing Performance Central, a new site from SourceForge and
> AppDynamics. Performance Central is your source for news, insights,
> analysis and resources for efficient Application Performance Management.
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk_______________________________________________
> Gmod-tripal-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel

signature.asc (465 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] Problems importing blast results via GFF

Stephen Ficklin-2
Hi Michael,

Yes, it would fill your organism table with a bunch of organisms. If you
do not specify the organism with the 'target_organism' attribute then
the GFF3 loader assumes the target organism is the same as for the
landmark feature and will create the target feature with the same
organism.  That may also not be proper. Just something to consider.

If you load these blast hits as features aligned to your genes (or mRNA)
sequences using the same organism. Then here is an idea....

When you view the page for the mRNA you will see a list of all of these
"match" features on the 'Alignments' link under the 'Resources'
sidebar.   If you would like to have the blast hits link out to
SwissProt then you may want to add the Dbxref attribute to the end of
each match:

augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match   82      2676    1309    .       1       ID=sp|Q8CJG0.3|AGO2_MOUSE:Dbxref=SwissProt:Q8CJG0


Be sure to substitute 'SwissProt' for the actual name in the Chado 'db'
table that you have for SwissProt.

You can then edit the tripal_feature_alignments.tpl.php alignment file
for the matches (context #3 in documentation in that file). You can use
the Tripal API (no SQL required) to see if there is a Dbxref for the
feature, and if so, then you can use entries from the DB.urlprefix +
Dbxref.accession to construct a URL to link out to SwissProt.

What's nice about this, is that you will probably never sync your match
feature types, so they will never have pages.  And users won't ever see
that the matched features belong to the landmark feature organism rather
than the true organism, and you can still link out to the feature at
SwissProt for users to get more information.

Stephen

On 8/22/2013 9:00 AM, Michael Dondrup wrote:

> Hi Stephen,
> no, the mus musculus is not present, however this job was submitted with create Target on. I have tried the same with target_organism set to
> Drosophila:Melanogaster which is present and that worked.  Is it supposed to create the organism if it is not present? Then I might consider
> not setting target_organism at all, because a blast run against NR would fill my database with the full taxonomy more or less.
>
>
> Michael
>
> On Aug 22, 2013, at 2:53 PM, Stephen Ficklin wrote:
>
>> Hi Michael,
>>
>> Is the 'mus musculus' organism present in your organism table?  Are the genus and species spelled correctly with proper case in your GFF 'target_organism' attribute?  If it does not exist, you can click the 'Create Target' checkbox which will automatically add the Target feature.   Have you already tried that and still get the error below?
>>
>> Stephen
>>
>> On 8/22/2013 7:23 AM, Michael Dondrup wrote:
>>> Hi everyone, and excuse me for bothering again.
>>>
>>> For reason explained in my earlier request I am trying to import blastp vs swissprot results as features via the GFF importer.
>>> My GFF file looks like this now (it's mosly output from the script BioPerl script bp_search2gff.pl, target_organism added manually):
>>>
>>> ##gff-version 3
>>> augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match   82      2676    1309    .       1       ID=sp|Q8CJG0.3|AGO2_MOUSE
>>> augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match_part      82      2676    0.0     .       0       Parent=sp|Q8CJG0.3|AGO2_MOUSE;Target=sp|Q8CJG0.3|AGO2_MOUSE 3 860;target_organism=mus:musculus
>>>
>>> What is going wrong here, the landmark is present in the database.
>>>
>>> The import job fails with the follwing errors:
>>>
>>> WD T_gff3_loader: Cannot find organism for target mus:musculus.         [warning]
>>> WD tripal_core: chado_prepare: 'sel_feature_orun' statement already     [error]
>>> WD tripal_core: chado_execute_prepared: wrong argument type supplied for[error]
>>> 'ins_featureloc_all' statement, field 2. Expected int but recieved ''
>>> syslog() expects parameter 1 to be long, string given syslog.module:85  [warning]
>>> WD tripal_core: tripal_core_chado_insert: Cannot insert record into     [error]
>>>
>>> ## this is my debug output of the featureloc, I guess the srcfeature_id is missing:
>>>
>>> 'featureloc': Array
>>> (
>>>      [feature_id] => 184995
>>>      [srcfeature_id] =>
>>>      [fmin] => 2
>>>      [is_fmin_partial] => FALSE
>>>      [fmax] => 4
>>>      [is_fmax_partial] => FALSE
>>>      [strand] => 0
>>>      [residue_info] =>
>>>      [locgroup] => 0
>>>      [rank] => 1
>>> )
>>> WD T_gff3_loader: Failed to insert featureloc                           [warning]
>>> Drush command terminated abnormally due to an unrecoverable error.      [error]
>>>
>>>
>>> cheers
>>> Michael
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Introducing Performance Central, a new site from SourceForge and
>>> AppDynamics. Performance Central is your source for news, insights,
>>> analysis and resources for efficient Application Performance Management.
>>> Visit us today!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
>>>
>>>
>>> _______________________________________________
>>> Gmod-tripal-devel mailing list
>>>
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
>> ------------------------------------------------------------------------------
>> Introducing Performance Central, a new site from SourceForge and
>> AppDynamics. Performance Central is your source for news, insights,
>> analysis and resources for efficient Application Performance Management.
>> Visit us today!
>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk_______________________________________________
>> Gmod-tripal-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel


------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] Problems importing blast results via GFF

Michael Dondrup-3
Hi Stephen,

thank you for these suggestions. I am writing my own converter anyway because I feel I need more control over
how the features are composed and wish to add additional properties. For example I want to add the
alignment strings as additional attributes so they can be displayed, and I will definitely add the Dbxref attribute.

I noticed a few things that make importing blastp hits a bit more complicated than eg. tblastx might be,
these observations might be of some use in case the Analysis-Blast importer is extended in the future, or just trivial, I don't quite know.

- Using landmarks other than the contig/chromosome doesn't seem to work in gbrowse, I have to check if that is intentional or if it can be changed.
So, the landmark must be the chromosome with the coordinates converted to 'global' coordinates.

- Coordinates in blastp are given in AA with respect to the protein sequence start; to use them in a DNA coordinate system, they have to be converted.
To make this work, the reference must be the coordinates of the CDS, not the gene or transcript, if I am not mistaken.

I calculated the new start and end coordinates wrt the CDS as
start := 3*start-2 ; end := 3*end
In my blast file I can parse them from the query description. In Tripal, that might need one or two more database lookups.

Hope this makes sense.

Michael


On Aug 22, 2013, at 3:16 PM, Stephen Ficklin wrote:

> Hi Michael,
>
> Yes, it would fill your organism table with a bunch of organisms. If you
> do not specify the organism with the 'target_organism' attribute then
> the GFF3 loader assumes the target organism is the same as for the
> landmark feature and will create the target feature with the same
> organism.  That may also not be proper. Just something to consider.
>
> If you load these blast hits as features aligned to your genes (or mRNA)
> sequences using the same organism. Then here is an idea....
>
> When you view the page for the mRNA you will see a list of all of these
> "match" features on the 'Alignments' link under the 'Resources'
> sidebar.   If you would like to have the blast hits link out to
> SwissProt then you may want to add the Dbxref attribute to the end of
> each match:
>
> augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match   82      2676    1309    .       1       ID=sp|Q8CJG0.3|AGO2_MOUSE:Dbxref=SwissProt:Q8CJG0
>
>
> Be sure to substitute 'SwissProt' for the actual name in the Chado 'db'
> table that you have for SwissProt.
>
> You can then edit the tripal_feature_alignments.tpl.php alignment file
> for the matches (context #3 in documentation in that file). You can use
> the Tripal API (no SQL required) to see if there is a Dbxref for the
> feature, and if so, then you can use entries from the DB.urlprefix +
> Dbxref.accession to construct a URL to link out to SwissProt.
>
> What's nice about this, is that you will probably never sync your match
> feature types, so they will never have pages.  And users won't ever see
> that the matched features belong to the landmark feature organism rather
> than the true organism, and you can still link out to the feature at
> SwissProt for users to get more information.
>
> Stephen
>
> On 8/22/2013 9:00 AM, Michael Dondrup wrote:
>> Hi Stephen,
>> no, the mus musculus is not present, however this job was submitted with create Target on. I have tried the same with target_organism set to
>> Drosophila:Melanogaster which is present and that worked.  Is it supposed to create the organism if it is not present? Then I might consider
>> not setting target_organism at all, because a blast run against NR would fill my database with the full taxonomy more or less.
>>
>>
>> Michael
>>
>> On Aug 22, 2013, at 2:53 PM, Stephen Ficklin wrote:
>>
>>> Hi Michael,
>>>
>>> Is the 'mus musculus' organism present in your organism table?  Are the genus and species spelled correctly with proper case in your GFF 'target_organism' attribute?  If it does not exist, you can click the 'Create Target' checkbox which will automatically add the Target feature.   Have you already tried that and still get the error below?
>>>
>>> Stephen
>>>
>>> On 8/22/2013 7:23 AM, Michael Dondrup wrote:
>>>> Hi everyone, and excuse me for bothering again.
>>>>
>>>> For reason explained in my earlier request I am trying to import blastp vs swissprot results as features via the GFF importer.
>>>> My GFF file looks like this now (it's mosly output from the script BioPerl script bp_search2gff.pl, target_organism added manually):
>>>>
>>>> ##gff-version 3
>>>> augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match   82      2676    1309    .       1       ID=sp|Q8CJG0.3|AGO2_MOUSE
>>>> augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match_part      82      2676    0.0     .       0       Parent=sp|Q8CJG0.3|AGO2_MOUSE;Target=sp|Q8CJG0.3|AGO2_MOUSE 3 860;target_organism=mus:musculus
>>>>
>>>> What is going wrong here, the landmark is present in the database.
>>>>
>>>> The import job fails with the follwing errors:
>>>>
>>>> WD T_gff3_loader: Cannot find organism for target mus:musculus.         [warning]
>>>> WD tripal_core: chado_prepare: 'sel_feature_orun' statement already     [error]
>>>> WD tripal_core: chado_execute_prepared: wrong argument type supplied for[error]
>>>> 'ins_featureloc_all' statement, field 2. Expected int but recieved ''
>>>> syslog() expects parameter 1 to be long, string given syslog.module:85  [warning]
>>>> WD tripal_core: tripal_core_chado_insert: Cannot insert record into     [error]
>>>>
>>>> ## this is my debug output of the featureloc, I guess the srcfeature_id is missing:
>>>>
>>>> 'featureloc': Array
>>>> (
>>>>     [feature_id] => 184995
>>>>     [srcfeature_id] =>
>>>>     [fmin] => 2
>>>>     [is_fmin_partial] => FALSE
>>>>     [fmax] => 4
>>>>     [is_fmax_partial] => FALSE
>>>>     [strand] => 0
>>>>     [residue_info] =>
>>>>     [locgroup] => 0
>>>>     [rank] => 1
>>>> )
>>>> WD T_gff3_loader: Failed to insert featureloc                           [warning]
>>>> Drush command terminated abnormally due to an unrecoverable error.      [error]
>>>>
>>>>
>>>> cheers
>>>> Michael
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Introducing Performance Central, a new site from SourceForge and
>>>> AppDynamics. Performance Central is your source for news, insights,
>>>> analysis and resources for efficient Application Performance Management.
>>>> Visit us today!
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
>>>>
>>>>
>>>> _______________________________________________
>>>> Gmod-tripal-devel mailing list
>>>>
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
>>> ------------------------------------------------------------------------------
>>> Introducing Performance Central, a new site from SourceForge and
>>> AppDynamics. Performance Central is your source for news, insights,
>>> analysis and resources for efficient Application Performance Management.
>>> Visit us today!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk_______________________________________________
>>> Gmod-tripal-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
>
>
> ------------------------------------------------------------------------------
> Introducing Performance Central, a new site from SourceForge and
> AppDynamics. Performance Central is your source for news, insights,
> analysis and resources for efficient Application Performance Management.
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
> _______________________________________________
> Gmod-tripal-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel

signature.asc (465 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] Problems importing blast results via GFF

Scott Cain
Hi Michael,

What do you want to do with GBrowse?  It does have a "recursivMapping" option, which will map things for you from intermediate coordinate systems (like contigs or transcripts) down to a chromosome.  Of course, that comes with a performance penalty.  I don't think this option gets used much, so if you do use it, please let me know how it goes.

Scott



On Fri, Aug 23, 2013 at 5:39 AM, Michael Dondrup <[hidden email]> wrote:
Hi Stephen,

thank you for these suggestions. I am writing my own converter anyway because I feel I need more control over
how the features are composed and wish to add additional properties. For example I want to add the
alignment strings as additional attributes so they can be displayed, and I will definitely add the Dbxref attribute.

I noticed a few things that make importing blastp hits a bit more complicated than eg. tblastx might be,
these observations might be of some use in case the Analysis-Blast importer is extended in the future, or just trivial, I don't quite know.

- Using landmarks other than the contig/chromosome doesn't seem to work in gbrowse, I have to check if that is intentional or if it can be changed.
So, the landmark must be the chromosome with the coordinates converted to 'global' coordinates.

- Coordinates in blastp are given in AA with respect to the protein sequence start; to use them in a DNA coordinate system, they have to be converted.
To make this work, the reference must be the coordinates of the CDS, not the gene or transcript, if I am not mistaken.

I calculated the new start and end coordinates wrt the CDS as
start := 3*start-2 ; end := 3*end
In my blast file I can parse them from the query description. In Tripal, that might need one or two more database lookups.

Hope this makes sense.

Michael


On Aug 22, 2013, at 3:16 PM, Stephen Ficklin wrote:

> Hi Michael,
>
> Yes, it would fill your organism table with a bunch of organisms. If you
> do not specify the organism with the 'target_organism' attribute then
> the GFF3 loader assumes the target organism is the same as for the
> landmark feature and will create the target feature with the same
> organism.  That may also not be proper. Just something to consider.
>
> If you load these blast hits as features aligned to your genes (or mRNA)
> sequences using the same organism. Then here is an idea....
>
> When you view the page for the mRNA you will see a list of all of these
> "match" features on the 'Alignments' link under the 'Resources'
> sidebar.   If you would like to have the blast hits link out to
> SwissProt then you may want to add the Dbxref attribute to the end of
> each match:
>
> augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match   82      2676    1309    .       1       ID=sp|Q8CJG0.3|AGO2_MOUSE:Dbxref=SwissProt:Q8CJG0
>
>
> Be sure to substitute 'SwissProt' for the actual name in the Chado 'db'
> table that you have for SwissProt.
>
> You can then edit the tripal_feature_alignments.tpl.php alignment file
> for the matches (context #3 in documentation in that file). You can use
> the Tripal API (no SQL required) to see if there is a Dbxref for the
> feature, and if so, then you can use entries from the DB.urlprefix +
> Dbxref.accession to construct a URL to link out to SwissProt.
>
> What's nice about this, is that you will probably never sync your match
> feature types, so they will never have pages.  And users won't ever see
> that the matched features belong to the landmark feature organism rather
> than the true organism, and you can still link out to the feature at
> SwissProt for users to get more information.
>
> Stephen
>
> On 8/22/2013 9:00 AM, Michael Dondrup wrote:
>> Hi Stephen,
>> no, the mus musculus is not present, however this job was submitted with create Target on. I have tried the same with target_organism set to
>> Drosophila:Melanogaster which is present and that worked.  Is it supposed to create the organism if it is not present? Then I might consider
>> not setting target_organism at all, because a blast run against NR would fill my database with the full taxonomy more or less.
>>
>>
>> Michael
>>
>> On Aug 22, 2013, at 2:53 PM, Stephen Ficklin wrote:
>>
>>> Hi Michael,
>>>
>>> Is the 'mus musculus' organism present in your organism table?  Are the genus and species spelled correctly with proper case in your GFF 'target_organism' attribute?  If it does not exist, you can click the 'Create Target' checkbox which will automatically add the Target feature.   Have you already tried that and still get the error below?
>>>
>>> Stephen
>>>
>>> On 8/22/2013 7:23 AM, Michael Dondrup wrote:
>>>> Hi everyone, and excuse me for bothering again.
>>>>
>>>> For reason explained in my earlier request I am trying to import blastp vs swissprot results as features via the GFF importer.
>>>> My GFF file looks like this now (it's mosly output from the script BioPerl script bp_search2gff.pl, target_organism added manually):
>>>>
>>>> ##gff-version 3
>>>> augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match   82      2676    1309    .       1       ID=sp|Q8CJG0.3|AGO2_MOUSE
>>>> augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match_part      82      2676    0.0     .       0       Parent=sp|Q8CJG0.3|AGO2_MOUSE;Target=sp|Q8CJG0.3|AGO2_MOUSE 3 860;target_organism=mus:musculus
>>>>
>>>> What is going wrong here, the landmark is present in the database.
>>>>
>>>> The import job fails with the follwing errors:
>>>>
>>>> WD T_gff3_loader: Cannot find organism for target mus:musculus.         [warning]
>>>> WD tripal_core: chado_prepare: 'sel_feature_orun' statement already     [error]
>>>> WD tripal_core: chado_execute_prepared: wrong argument type supplied for[error]
>>>> 'ins_featureloc_all' statement, field 2. Expected int but recieved ''
>>>> syslog() expects parameter 1 to be long, string given syslog.module:85  [warning]
>>>> WD tripal_core: tripal_core_chado_insert: Cannot insert record into     [error]
>>>>
>>>> ## this is my debug output of the featureloc, I guess the srcfeature_id is missing:
>>>>
>>>> 'featureloc': Array
>>>> (
>>>>     [feature_id] => 184995
>>>>     [srcfeature_id] =>
>>>>     [fmin] => 2
>>>>     [is_fmin_partial] => FALSE
>>>>     [fmax] => 4
>>>>     [is_fmax_partial] => FALSE
>>>>     [strand] => 0
>>>>     [residue_info] =>
>>>>     [locgroup] => 0
>>>>     [rank] => 1
>>>> )
>>>> WD T_gff3_loader: Failed to insert featureloc                           [warning]
>>>> Drush command terminated abnormally due to an unrecoverable error.      [error]
>>>>
>>>>
>>>> cheers
>>>> Michael
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Introducing Performance Central, a new site from SourceForge and
>>>> AppDynamics. Performance Central is your source for news, insights,
>>>> analysis and resources for efficient Application Performance Management.
>>>> Visit us today!
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
>>>>
>>>>
>>>> _______________________________________________
>>>> Gmod-tripal-devel mailing list
>>>>
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
>>> ------------------------------------------------------------------------------
>>> Introducing Performance Central, a new site from SourceForge and
>>> AppDynamics. Performance Central is your source for news, insights,
>>> analysis and resources for efficient Application Performance Management.
>>> Visit us today!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk_______________________________________________
>>> Gmod-tripal-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
>
>
> ------------------------------------------------------------------------------
> Introducing Performance Central, a new site from SourceForge and
> AppDynamics. Performance Central is your source for news, insights,
> analysis and resources for efficient Application Performance Management.
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
> _______________________________________________
> Gmod-tripal-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel


------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] Problems importing blast results via GFF

Michael Dondrup-3
Hi Scott,
I wish to display blastp (for each gene) results as a track. They come assigned to the corresponding coding sequence,
and with coordinates in amino-acids relative to the query sequence (which is aa as well). I could calculate
absolute coordinates as described below.

Michael

On Aug 23, 2013, at 4:07 PM, Scott Cain wrote:

> Hi Michael,
>
> What do you want to do with GBrowse?  It does have a "recursivMapping" option, which will map things for you from intermediate coordinate systems (like contigs or transcripts) down to a chromosome.  Of course, that comes with a performance penalty.  I don't think this option gets used much, so if you do use it, please let me know how it goes.
>
> Scott
>
>
>
> On Fri, Aug 23, 2013 at 5:39 AM, Michael Dondrup <[hidden email]> wrote:
> Hi Stephen,
>
> thank you for these suggestions. I am writing my own converter anyway because I feel I need more control over
> how the features are composed and wish to add additional properties. For example I want to add the
> alignment strings as additional attributes so they can be displayed, and I will definitely add the Dbxref attribute.
>
> I noticed a few things that make importing blastp hits a bit more complicated than eg. tblastx might be,
> these observations might be of some use in case the Analysis-Blast importer is extended in the future, or just trivial, I don't quite know.
>
> - Using landmarks other than the contig/chromosome doesn't seem to work in gbrowse, I have to check if that is intentional or if it can be changed.
> So, the landmark must be the chromosome with the coordinates converted to 'global' coordinates.
>
> - Coordinates in blastp are given in AA with respect to the protein sequence start; to use them in a DNA coordinate system, they have to be converted.
> To make this work, the reference must be the coordinates of the CDS, not the gene or transcript, if I am not mistaken.
>
> I calculated the new start and end coordinates wrt the CDS as
> start := 3*start-2 ; end := 3*end
> In my blast file I can parse them from the query description. In Tripal, that might need one or two more database lookups.
>
> Hope this makes sense.
>
> Michael
>
>
> On Aug 22, 2013, at 3:16 PM, Stephen Ficklin wrote:
>
> > Hi Michael,
> >
> > Yes, it would fill your organism table with a bunch of organisms. If you
> > do not specify the organism with the 'target_organism' attribute then
> > the GFF3 loader assumes the target organism is the same as for the
> > landmark feature and will create the target feature with the same
> > organism.  That may also not be proper. Just something to consider.
> >
> > If you load these blast hits as features aligned to your genes (or mRNA)
> > sequences using the same organism. Then here is an idea....
> >
> > When you view the page for the mRNA you will see a list of all of these
> > "match" features on the 'Alignments' link under the 'Resources'
> > sidebar.   If you would like to have the blast hits link out to
> > SwissProt then you may want to add the Dbxref attribute to the end of
> > each match:
> >
> > augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match   82      2676    1309    .       1       ID=sp|Q8CJG0.3|AGO2_MOUSE:Dbxref=SwissProt:Q8CJG0
> >
> >
> > Be sure to substitute 'SwissProt' for the actual name in the Chado 'db'
> > table that you have for SwissProt.
> >
> > You can then edit the tripal_feature_alignments.tpl.php alignment file
> > for the matches (context #3 in documentation in that file). You can use
> > the Tripal API (no SQL required) to see if there is a Dbxref for the
> > feature, and if so, then you can use entries from the DB.urlprefix +
> > Dbxref.accession to construct a URL to link out to SwissProt.
> >
> > What's nice about this, is that you will probably never sync your match
> > feature types, so they will never have pages.  And users won't ever see
> > that the matched features belong to the landmark feature organism rather
> > than the true organism, and you can still link out to the feature at
> > SwissProt for users to get more information.
> >
> > Stephen
> >
> > On 8/22/2013 9:00 AM, Michael Dondrup wrote:
> >> Hi Stephen,
> >> no, the mus musculus is not present, however this job was submitted with create Target on. I have tried the same with target_organism set to
> >> Drosophila:Melanogaster which is present and that worked.  Is it supposed to create the organism if it is not present? Then I might consider
> >> not setting target_organism at all, because a blast run against NR would fill my database with the full taxonomy more or less.
> >>
> >>
> >> Michael
> >>
> >> On Aug 22, 2013, at 2:53 PM, Stephen Ficklin wrote:
> >>
> >>> Hi Michael,
> >>>
> >>> Is the 'mus musculus' organism present in your organism table?  Are the genus and species spelled correctly with proper case in your GFF 'target_organism' attribute?  If it does not exist, you can click the 'Create Target' checkbox which will automatically add the Target feature.   Have you already tried that and still get the error below?
> >>>
> >>> Stephen
> >>>
> >>> On 8/22/2013 7:23 AM, Michael Dondrup wrote:
> >>>> Hi everyone, and excuse me for bothering again.
> >>>>
> >>>> For reason explained in my earlier request I am trying to import blastp vs swissprot results as features via the GFF importer.
> >>>> My GFF file looks like this now (it's mosly output from the script BioPerl script bp_search2gff.pl, target_organism added manually):
> >>>>
> >>>> ##gff-version 3
> >>>> augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match   82      2676    1309    .       1       ID=sp|Q8CJG0.3|AGO2_MOUSE
> >>>> augustus_masked-LSalAtl2s251-processed-gene-10.6-mRNA-1 BLASTp.swissprot        match_part      82      2676    0.0     .       0       Parent=sp|Q8CJG0.3|AGO2_MOUSE;Target=sp|Q8CJG0.3|AGO2_MOUSE 3 860;target_organism=mus:musculus
> >>>>
> >>>> What is going wrong here, the landmark is present in the database.
> >>>>
> >>>> The import job fails with the follwing errors:
> >>>>
> >>>> WD T_gff3_loader: Cannot find organism for target mus:musculus.         [warning]
> >>>> WD tripal_core: chado_prepare: 'sel_feature_orun' statement already     [error]
> >>>> WD tripal_core: chado_execute_prepared: wrong argument type supplied for[error]
> >>>> 'ins_featureloc_all' statement, field 2. Expected int but recieved ''
> >>>> syslog() expects parameter 1 to be long, string given syslog.module:85  [warning]
> >>>> WD tripal_core: tripal_core_chado_insert: Cannot insert record into     [error]
> >>>>
> >>>> ## this is my debug output of the featureloc, I guess the srcfeature_id is missing:
> >>>>
> >>>> 'featureloc': Array
> >>>> (
> >>>>     [feature_id] => 184995
> >>>>     [srcfeature_id] =>
> >>>>     [fmin] => 2
> >>>>     [is_fmin_partial] => FALSE
> >>>>     [fmax] => 4
> >>>>     [is_fmax_partial] => FALSE
> >>>>     [strand] => 0
> >>>>     [residue_info] =>
> >>>>     [locgroup] => 0
> >>>>     [rank] => 1
> >>>> )
> >>>> WD T_gff3_loader: Failed to insert featureloc                           [warning]
> >>>> Drush command terminated abnormally due to an unrecoverable error.      [error]
> >>>>
> >>>>
> >>>> cheers
> >>>> Michael
> >>>>
> >>>>
> >>>>
> >>>> ------------------------------------------------------------------------------
> >>>> Introducing Performance Central, a new site from SourceForge and
> >>>> AppDynamics. Performance Central is your source for news, insights,
> >>>> analysis and resources for efficient Application Performance Management.
> >>>> Visit us today!
> >>>>
> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Gmod-tripal-devel mailing list
> >>>>
> >>>> [hidden email]
> >>>> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
> >>> ------------------------------------------------------------------------------
> >>> Introducing Performance Central, a new site from SourceForge and
> >>> AppDynamics. Performance Central is your source for news, insights,
> >>> analysis and resources for efficient Application Performance Management.
> >>> Visit us today!
> >>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk_______________________________________________
> >>> Gmod-tripal-devel mailing list
> >>> [hidden email]
> >>> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
> >
> >
> > ------------------------------------------------------------------------------
> > Introducing Performance Central, a new site from SourceForge and
> > AppDynamics. Performance Central is your source for news, insights,
> > analysis and resources for efficient Application Performance Management.
> > Visit us today!
> > http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Gmod-tripal-devel mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
>
>
> ------------------------------------------------------------------------------
> Introducing Performance Central, a new site from SourceForge and
> AppDynamics. Performance Central is your source for news, insights,
> analysis and resources for efficient Application Performance Management.
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
> _______________________________________________
> Gmod-tripal-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> ------------------------------------------------------------------------------
> Introducing Performance Central, a new site from SourceForge and
> AppDynamics. Performance Central is your source for news, insights,
> analysis and resources for efficient Application Performance Management.
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk_______________________________________________
> Gmod-tripal-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel

signature.asc (465 bytes) Download Attachment