[Gmod-tripal-devel] Could there be a bug in Blast parser?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Gmod-tripal-devel] Could there be a bug in Blast parser?

Michael Dondrup-3
Cross-posting
because this might be the more appropriate list.

Regards
Michael

Begin forwarded message:

> From: Michael Dondrup <[hidden email]>
> Subject: Re: [Gmod-tripal] Importing Analysis:Blast with regular expressions
> Date: August 14, 2013 9:37:07 AM GMT+02:00
> To: gmod-tripal <[hidden email]>
>
> Hm,
> I think there must be something else broken with the blast importer or my input file. I manually changed the input file to contain
> only the gene name, I checked that this gene is in the database. Then I ran blast again with the following arguments:
>
> blastp -db blastdb/nr -query ~/lakselus/test2.fa -out ~/lakselus/testblast2.xml -evalue 1E-6 -num_threads 24 -outfmt 5
>
> I have attached the blast xml output. Running the importer yields:
>
> Calling: tripal_analysis_blast_parseXMLFile(21, 5, /tmp/testblast2.xml, 10, , , , 1, 0, 0, 59)
> Done.ng element 1 of 0 (100%). Memory: 37,459,688 bytes.
>
> Any ideas?
>
> Michael
>
> On Aug 13, 2013, at 11:27 AM, Michael Dondrup wrote:
>
>> Hi,
>> I am trying to import a single blastp result into Tripal 1.1 using Analysis:Blast module.
>> The headers in the AA file do not fully match the gene feature names in the database so I
>> was trying to use the regex field in the Blast import page, but that didn't seem to work, possibly I got
>> the regex syntax wrong.
>> The names of the genes in the database look like:
>> maker-LSalAtl2s866-snap-gene-0.8 but the fasta defline looks like:
>> maker-LSalAtl2s866-snap-gene-0.8-mRNA-1 pep:novel supercontig:LSalAtl2s:LSalAtl2s866:43827:62377:-1 gene:maker-LSalAtl2s866-snap-gene-0.8 transcript:maker-LSalAtl2s866-snap-gene-0.8-mRNA-1 description:&quot;&quot;
>>
>> So I was trying to use the following PCRE regexes [1]:
>>
>> gene:(maker\-LSalAtl2s\d+\-(snap|augustus)\-gene\-\d+\.\d+)  #should return the portion in parenthesis and
>> maker\-LSalAtl2s\d+\-snap\-gene\-\d+\.\d+ # bit simplified.
>> /maker\-LSalAtl2s\d+\-snap\-gene\-\d+\.\d+/ # with extra delimiter
>>
>> But none seemed to work.
>>
>> Calling: tripal_analysis_blast_parseXMLFile(17, 19, /tmp/testblast.xml, 10, , /maker\-LSalAtl2s\d+\-snap\-gene\-\d+\.\d+/, , 0, 0, 1, 56)
>> Done.ng element 1 of 0 (100%). Memory: 39,737,968 bytes.
>>
>>
>> What is the right syntax for those regexps? And how can I verify that the import was actually successful?
>>
>>
>> [1]: http://www.php.net/manual/en/pcre.pattern.php 
>> and according to the manual this should work.
>>
>> Michael Dondrup
>> Postdoctoral fellow
>> Sea Lice Research Centre/Department of Informatics
>> University of Bergen
>> Thormøhlensgate 55, N-5008 Bergen,
>> Norway
>>
>> ------------------------------------------------------------------------------
>> Get 100% visibility into Java/.NET code with AppDynamics Lite!
>> It's a free troubleshooting tool designed for production.
>> Get down to code-level detail for bottlenecks, with <2% overhead.
>> Download for free and get started troubleshooting in minutes.
>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk_______________________________________________
>> Gmod-tripal mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-tripal

> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk_______________________________________________
> Gmod-tripal mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-tripal


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel

testblast2.xml (691K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] Could there be a bug in Blast parser?

Michael Dondrup-3
I apologize,

I found it and it is working, I was just unable to find the import.
Blast results will show up on the page for each feature. I was assuming hits are shown as
features of their own and then appear in the list of features. If I understand correctly, the
blast parser will not generate features for hits unlike the GFF importer? I wish to generate a gbrowse
track for my blast results as well. Would it be possible to tweak the importer to generate a feature
as well, or do I have to convert to GFF and then load this individually?

Best
Michael


On Aug 15, 2013, at 8:41 AM, Michael Dondrup wrote:

> Cross-posting
> because this might be the more appropriate list.
>
> Regards
> Michael
>
> Begin forwarded message:
>
>> From: Michael Dondrup <[hidden email]>
>> Subject: Re: [Gmod-tripal] Importing Analysis:Blast with regular expressions
>> Date: August 14, 2013 9:37:07 AM GMT+02:00
>> To: gmod-tripal <[hidden email]>
>>
>> Hm,
>> I think there must be something else broken with the blast importer or my input file. I manually changed the input file to contain
>> only the gene name, I checked that this gene is in the database. Then I ran blast again with the following arguments:
>>
>> blastp -db blastdb/nr -query ~/lakselus/test2.fa -out ~/lakselus/testblast2.xml -evalue 1E-6 -num_threads 24 -outfmt 5
>>
>> I have attached the blast xml output. Running the importer yields:
>>
>> Calling: tripal_analysis_blast_parseXMLFile(21, 5, /tmp/testblast2.xml, 10, , , , 1, 0, 0, 59)
>> Done.ng element 1 of 0 (100%). Memory: 37,459,688 bytes.
>>
>> Any ideas?
>>
>> Michael
>>
>> On Aug 13, 2013, at 11:27 AM, Michael Dondrup wrote:
>>
>>> Hi,
>>> I am trying to import a single blastp result into Tripal 1.1 using Analysis:Blast module.
>>> The headers in the AA file do not fully match the gene feature names in the database so I
>>> was trying to use the regex field in the Blast import page, but that didn't seem to work, possibly I got
>>> the regex syntax wrong.
>>> The names of the genes in the database look like:
>>> maker-LSalAtl2s866-snap-gene-0.8 but the fasta defline looks like:
>>> maker-LSalAtl2s866-snap-gene-0.8-mRNA-1 pep:novel supercontig:LSalAtl2s:LSalAtl2s866:43827:62377:-1 gene:maker-LSalAtl2s866-snap-gene-0.8 transcript:maker-LSalAtl2s866-snap-gene-0.8-mRNA-1 description:&quot;&quot;
>>>
>>> So I was trying to use the following PCRE regexes [1]:
>>>
>>> gene:(maker\-LSalAtl2s\d+\-(snap|augustus)\-gene\-\d+\.\d+)  #should return the portion in parenthesis and
>>> maker\-LSalAtl2s\d+\-snap\-gene\-\d+\.\d+ # bit simplified.
>>> /maker\-LSalAtl2s\d+\-snap\-gene\-\d+\.\d+/ # with extra delimiter
>>>
>>> But none seemed to work.
>>>
>>> Calling: tripal_analysis_blast_parseXMLFile(17, 19, /tmp/testblast.xml, 10, , /maker\-LSalAtl2s\d+\-snap\-gene\-\d+\.\d+/, , 0, 0, 1, 56)
>>> Done.ng element 1 of 0 (100%). Memory: 39,737,968 bytes.
>>>
>>>
>>> What is the right syntax for those regexps? And how can I verify that the import was actually successful?
>>>
>>>
>>> [1]: http://www.php.net/manual/en/pcre.pattern.php 
>>> and according to the manual this should work.
>>>
>>> Michael Dondrup
>>> Postdoctoral fellow
>>> Sea Lice Research Centre/Department of Informatics
>>> University of Bergen
>>> Thormøhlensgate 55, N-5008 Bergen,
>>> Norway
>>>
>>> ------------------------------------------------------------------------------
>>> Get 100% visibility into Java/.NET code with AppDynamics Lite!
>>> It's a free troubleshooting tool designed for production.
>>> Get down to code-level detail for bottlenecks, with <2% overhead.
>>> Download for free and get started troubleshooting in minutes.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk_______________________________________________
>>> Gmod-tripal mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-tripal
> <testblast2.xml>
>>
>> ------------------------------------------------------------------------------
>> Get 100% visibility into Java/.NET code with AppDynamics Lite!
>> It's a free troubleshooting tool designed for production.
>> Get down to code-level detail for bottlenecks, with <2% overhead.
>> Download for free and get started troubleshooting in minutes.
>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk_______________________________________________
>> Gmod-tripal mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-tripal
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk_______________________________________________
> Gmod-tripal-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel

signature.asc (465 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] [Gmod-tripal] Could there be a bug in Blast parser?

Stephen Ficklin-2
Hi Michael,

Yes, we can adjust the blast tool to create matches that can be shown in GBrowse.  I assume you are using the Chado database for GBrowse as well?  I have added the feature request here:  https://drupal.org/node/2068905.  If you have a Drupal account you can follow that issue and be notified if there are updates to it.

Until we can get this working, you will, unfortunately, need to convert your blast results to GFF3 and load them that way.

Stephen

On 8/16/2013 9:24 AM, Michael Dondrup wrote:
I apologize, 

I found it and it is working, I was just unable to find the import. 
Blast results will show up on the page for each feature. I was assuming hits are shown as 
features of their own and then appear in the list of features. If I understand correctly, the 
blast parser will not generate features for hits unlike the GFF importer? I wish to generate a gbrowse 
track for my blast results as well. Would it be possible to tweak the importer to generate a feature 
as well, or do I have to convert to GFF and then load this individually?

Best
Michael


On Aug 15, 2013, at 8:41 AM, Michael Dondrup wrote:

Cross-posting
because this might be the more appropriate list.

Regards
Michael

Begin forwarded message:

From: Michael Dondrup [hidden email]
Subject: Re: [Gmod-tripal] Importing Analysis:Blast with regular expressions
Date: August 14, 2013 9:37:07 AM GMT+02:00
To: gmod-tripal [hidden email]

Hm,
I think there must be something else broken with the blast importer or my input file. I manually changed the input file to contain
only the gene name, I checked that this gene is in the database. Then I ran blast again with the following arguments:

blastp -db blastdb/nr -query ~/lakselus/test2.fa -out ~/lakselus/testblast2.xml -evalue 1E-6 -num_threads 24 -outfmt 5

I have attached the blast xml output. Running the importer yields:

Calling: tripal_analysis_blast_parseXMLFile(21, 5, /tmp/testblast2.xml, 10, , , , 1, 0, 0, 59)
Done.ng element 1 of 0 (100%). Memory: 37,459,688 bytes.

Any ideas?

Michael

On Aug 13, 2013, at 11:27 AM, Michael Dondrup wrote:

Hi,
I am trying to import a single blastp result into Tripal 1.1 using Analysis:Blast module.
The headers in the AA file do not fully match the gene feature names in the database so I
was trying to use the regex field in the Blast import page, but that didn't seem to work, possibly I got 
the regex syntax wrong. 
The names of the genes in the database look like:
maker-LSalAtl2s866-snap-gene-0.8 but the fasta defline looks like:
maker-LSalAtl2s866-snap-gene-0.8-mRNA-1 pep:novel supercontig:LSalAtl2s:LSalAtl2s866:43827:62377:-1 gene:maker-LSalAtl2s866-snap-gene-0.8 transcript:maker-LSalAtl2s866-snap-gene-0.8-mRNA-1 description:&quot;&quot;

So I was trying to use the following PCRE regexes [1]:

gene:(maker\-LSalAtl2s\d+\-(snap|augustus)\-gene\-\d+\.\d+)  #should return the portion in parenthesis and 
maker\-LSalAtl2s\d+\-snap\-gene\-\d+\.\d+ # bit simplified. 
/maker\-LSalAtl2s\d+\-snap\-gene\-\d+\.\d+/ # with extra delimiter

But none seemed to work.

Calling: tripal_analysis_blast_parseXMLFile(17, 19, /tmp/testblast.xml, 10, , /maker\-LSalAtl2s\d+\-snap\-gene\-\d+\.\d+/, , 0, 0, 1, 56)
Done.ng element 1 of 0 (100%). Memory: 39,737,968 bytes.


What is the right syntax for those regexps? And how can I verify that the import was actually successful?


[1]: http://www.php.net/manual/en/pcre.pattern.php 
and according to the manual this should work. 

Michael Dondrup
Postdoctoral fellow
Sea Lice Research Centre/Department of Informatics
University of Bergen
Thormøhlensgate 55, N-5008 Bergen, 
Norway

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
<testblast2.xml>
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel

      

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel