Parsing Blast XML: Hit name and Hit Accession

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsing Blast XML: Hit name and Hit Accession

Sofia Robb
I have a question about the parsing of the blast xml and how to specify the correct hit name and hit accession in the blast analysis settings. I see that there are fields for regular expressions in the blast analysis settings. I would like to know what text those regular expressions will be searching.

It appears that my hit name and hit accession are taken from the first word of the <Hit_def>, and my hit description becomes the remaining text from <Hit_def>  which is incorrect.  I would like to take the id and accession from the <Hit_id> and <Hit_accession> and to have my description be the complete <Hit_def>. Do I include the xml tags in my regular expression to extract the correct information? Or is only the <Hit_def> being searched? If so, how do I get the id and accession?

Extracted from my xml:
  <Hit_id>sp|Q8SQA4|CD97_BOVIN</Hit_id>
  <Hit_def>CD97 antigen OS=Bos taurus GN=CD97 PE=2 SV=1</Hit_def>
  <Hit_accession>Q8SQA4</Hit_accession>

From the sequence feature info page:
Match NameE-valueIdentityDescription
CD973.587360e-627.63antigen OS=Bos taurus GN=CD97 PE=2 SV=1[more]


Thanks,
Sofia

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: Parsing Blast XML: Hit name and Hit Accession

Stephen Ficklin-2
Hi Sofia,

The regular expressions are meant to parse details from the <Hit_def> tag only.    If you click the box "Use Genbank style parser" it may get you closer to what you want.   But, you hit on an important problem in that it's not clearly described what those regular expressions are for and where they are used.   Also, there definintely should be more flexibility in what tag each regular expression can be used for.  I added a new issue to remind us to fix that:

https://www.drupal.org/node/2660580

If you are unable to properly extract the fields you need by setting the "Use Genbank style parser" then we'll put a bit more emphasis to fix this for you.

Stephen


On 1/29/2016 12:40 PM, Sofia Robb wrote:
I have a question about the parsing of the blast xml and how to specify the correct hit name and hit accession in the blast analysis settings. I see that there are fields for regular expressions in the blast analysis settings. I would like to know what text those regular expressions will be searching.

It appears that my hit name and hit accession are taken from the first word of the <Hit_def>, and my hit description becomes the remaining text from <Hit_def>  which is incorrect.  I would like to take the id and accession from the <Hit_id> and <Hit_accession> and to have my description be the complete <Hit_def>. Do I include the xml tags in my regular expression to extract the correct information? Or is only the <Hit_def> being searched? If so, how do I get the id and accession?

Extracted from my xml:
  <Hit_id>sp|Q8SQA4|CD97_BOVIN</Hit_id>
  <Hit_def>CD97 antigen OS=Bos taurus GN=CD97 PE=2 SV=1</Hit_def>
  <Hit_accession>Q8SQA4</Hit_accession>

From the sequence feature info page:

Match Name E-value Identity Description
CD97 3.587360e-6 27.63 antigen OS=Bos taurus GN=CD97 PE=2 SV=1 [more]


Thanks,
Sofia


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: Parsing Blast XML: Hit name and Hit Accession

Sofia Robb
Hi Stephen,

When I select "Use Genbank style parser" nothing changes except that my text fields are now grey. Just to be clear, are you saying that I run the parser after i check  "Use Genbank style parser" and save or just check the available form fields? 

Thanks!
Sofia

Without checking  "Use Genbank style parser"
Inline image 1

with checking  "Use Genbank style parser"
Inline image 2

On Mon, Feb 1, 2016 at 11:35 AM, Stephen Ficklin <[hidden email]> wrote:
Hi Sofia,

The regular expressions are meant to parse details from the <Hit_def> tag only.    If you click the box "Use Genbank style parser" it may get you closer to what you want.   But, you hit on an important problem in that it's not clearly described what those regular expressions are for and where they are used.   Also, there definintely should be more flexibility in what tag each regular expression can be used for.  I added a new issue to remind us to fix that:

https://www.drupal.org/node/2660580

If you are unable to properly extract the fields you need by setting the "Use Genbank style parser" then we'll put a bit more emphasis to fix this for you.

Stephen



On 1/29/2016 12:40 PM, Sofia Robb wrote:
I have a question about the parsing of the blast xml and how to specify the correct hit name and hit accession in the blast analysis settings. I see that there are fields for regular expressions in the blast analysis settings. I would like to know what text those regular expressions will be searching.

It appears that my hit name and hit accession are taken from the first word of the <Hit_def>, and my hit description becomes the remaining text from <Hit_def>  which is incorrect.  I would like to take the id and accession from the <Hit_id> and <Hit_accession> and to have my description be the complete <Hit_def>. Do I include the xml tags in my regular expression to extract the correct information? Or is only the <Hit_def> being searched? If so, how do I get the id and accession?

Extracted from my xml:
  <Hit_id>sp|Q8SQA4|CD97_BOVIN</Hit_id>
  <Hit_def>CD97 antigen OS=Bos taurus GN=CD97 PE=2 SV=1</Hit_def>
  <Hit_accession>Q8SQA4</Hit_accession>

From the sequence feature info page:

Match Name E-value Identity Description
CD97 3.587360e-6 27.63 antigen OS=Bos taurus GN=CD97 PE=2 SV=1 [more]


Thanks,
Sofia


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal



------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: Parsing Blast XML: Hit name and Hit Accession

Stephen Ficklin-2
HI Sofia,

Yes, that's correct.  Check the box, save and then retry your import and see if that gets you what you wanted. 

Stephen

On 2/1/2016 1:58 PM, Sofia Robb wrote:
Hi Stephen,

When I select "Use Genbank style parser" nothing changes except that my text fields are now grey. Just to be clear, are you saying that I run the parser after i check  "Use Genbank style parser" and save or just check the available form fields? 

Thanks!
Sofia

Without checking  "Use Genbank style parser"
Inline
              image 1

with checking  "Use Genbank style parser"
Inline
              image 2

On Mon, Feb 1, 2016 at 11:35 AM, Stephen Ficklin <[hidden email]> wrote:
Hi Sofia,

The regular expressions are meant to parse details from the <Hit_def> tag only.    If you click the box "Use Genbank style parser" it may get you closer to what you want.   But, you hit on an important problem in that it's not clearly described what those regular expressions are for and where they are used.   Also, there definintely should be more flexibility in what tag each regular expression can be used for.  I added a new issue to remind us to fix that:

https://www.drupal.org/node/2660580

If you are unable to properly extract the fields you need by setting the "Use Genbank style parser" then we'll put a bit more emphasis to fix this for you.

Stephen



On 1/29/2016 12:40 PM, Sofia Robb wrote:
I have a question about the parsing of the blast xml and how to specify the correct hit name and hit accession in the blast analysis settings. I see that there are fields for regular expressions in the blast analysis settings. I would like to know what text those regular expressions will be searching.

It appears that my hit name and hit accession are taken from the first word of the <Hit_def>, and my hit description becomes the remaining text from <Hit_def>  which is incorrect.  I would like to take the id and accession from the <Hit_id> and <Hit_accession> and to have my description be the complete <Hit_def>. Do I include the xml tags in my regular expression to extract the correct information? Or is only the <Hit_def> being searched? If so, how do I get the id and accession?

Extracted from my xml:
  <Hit_id>sp|Q8SQA4|CD97_BOVIN</Hit_id>
  <Hit_def>CD97 antigen OS=Bos taurus GN=CD97 PE=2 SV=1</Hit_def>
  <Hit_accession>Q8SQA4</Hit_accession>

From the sequence feature info page:

Match Name E-value Identity Description
CD97 3.587360e-6 27.63 antigen OS=Bos taurus GN=CD97 PE=2 SV=1 [more]


Thanks,
Sofia


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal




------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: Parsing Blast XML: Hit name and Hit Accession

Sofia Robb
Ahh Haa! Yes, changing this to genbank style parser worked!! Thank you!!

On Mon, Feb 1, 2016 at 3:17 PM, Stephen Ficklin <[hidden email]> wrote:
HI Sofia,

Yes, that's correct.  Check the box, save and then retry your import and see if that gets you what you wanted. 

Stephen


On 2/1/2016 1:58 PM, Sofia Robb wrote:
Hi Stephen,

When I select "Use Genbank style parser" nothing changes except that my text fields are now grey. Just to be clear, are you saying that I run the parser after i check  "Use Genbank style parser" and save or just check the available form fields? 

Thanks!
Sofia

Without checking  "Use Genbank style parser"
Inline
              image 1

with checking  "Use Genbank style parser"
Inline
              image 2

On Mon, Feb 1, 2016 at 11:35 AM, Stephen Ficklin <[hidden email]> wrote:
Hi Sofia,

The regular expressions are meant to parse details from the <Hit_def> tag only.    If you click the box "Use Genbank style parser" it may get you closer to what you want.   But, you hit on an important problem in that it's not clearly described what those regular expressions are for and where they are used.   Also, there definintely should be more flexibility in what tag each regular expression can be used for.  I added a new issue to remind us to fix that:

https://www.drupal.org/node/2660580

If you are unable to properly extract the fields you need by setting the "Use Genbank style parser" then we'll put a bit more emphasis to fix this for you.

Stephen



On 1/29/2016 12:40 PM, Sofia Robb wrote:
I have a question about the parsing of the blast xml and how to specify the correct hit name and hit accession in the blast analysis settings. I see that there are fields for regular expressions in the blast analysis settings. I would like to know what text those regular expressions will be searching.

It appears that my hit name and hit accession are taken from the first word of the <Hit_def>, and my hit description becomes the remaining text from <Hit_def>  which is incorrect.  I would like to take the id and accession from the <Hit_id> and <Hit_accession> and to have my description be the complete <Hit_def>. Do I include the xml tags in my regular expression to extract the correct information? Or is only the <Hit_def> being searched? If so, how do I get the id and accession?

Extracted from my xml:
  <Hit_id>sp|Q8SQA4|CD97_BOVIN</Hit_id>
  <Hit_def>CD97 antigen OS=Bos taurus GN=CD97 PE=2 SV=1</Hit_def>
  <Hit_accession>Q8SQA4</Hit_accession>

From the sequence feature info page:

Match Name E-value Identity Description
CD97 3.587360e-6 27.63 antigen OS=Bos taurus GN=CD97 PE=2 SV=1 [more]


Thanks,
Sofia


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal





------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: Parsing Blast XML: Hit name and Hit Accession

Stephen Ficklin-2
Great! For the future, if you have a database against which you use BLAST and the definition line is not formatted in a way the BLAST can pull out the components into hit_id, hit_def and hit_accession tags then everything will be in hit_def and you can use the regular expressions to pull out what you want.

Stephen

On 2/1/2016 3:01 PM, Sofia Robb wrote:
Ahh Haa! Yes, changing this to genbank style parser worked!! Thank you!!

On Mon, Feb 1, 2016 at 3:17 PM, Stephen Ficklin <[hidden email]> wrote:
HI Sofia,

Yes, that's correct.  Check the box, save and then retry your import and see if that gets you what you wanted. 

Stephen


On 2/1/2016 1:58 PM, Sofia Robb wrote:
Hi Stephen,

When I select "Use Genbank style parser" nothing changes except that my text fields are now grey. Just to be clear, are you saying that I run the parser after i check  "Use Genbank style parser" and save or just check the available form fields? 

Thanks!
Sofia

Without checking  "Use Genbank style parser"
Inline image 1

with checking  "Use Genbank style parser"
Inline image 2

On Mon, Feb 1, 2016 at 11:35 AM, Stephen Ficklin <[hidden email]> wrote:
Hi Sofia,

The regular expressions are meant to parse details from the <Hit_def> tag only.    If you click the box "Use Genbank style parser" it may get you closer to what you want.   But, you hit on an important problem in that it's not clearly described what those regular expressions are for and where they are used.   Also, there definintely should be more flexibility in what tag each regular expression can be used for.  I added a new issue to remind us to fix that:

https://www.drupal.org/node/2660580

If you are unable to properly extract the fields you need by setting the "Use Genbank style parser" then we'll put a bit more emphasis to fix this for you.

Stephen



On 1/29/2016 12:40 PM, Sofia Robb wrote:
I have a question about the parsing of the blast xml and how to specify the correct hit name and hit accession in the blast analysis settings. I see that there are fields for regular expressions in the blast analysis settings. I would like to know what text those regular expressions will be searching.

It appears that my hit name and hit accession are taken from the first word of the <Hit_def>, and my hit description becomes the remaining text from <Hit_def>  which is incorrect.  I would like to take the id and accession from the <Hit_id> and <Hit_accession> and to have my description be the complete <Hit_def>. Do I include the xml tags in my regular expression to extract the correct information? Or is only the <Hit_def> being searched? If so, how do I get the id and accession?

Extracted from my xml:
  <Hit_id>sp|Q8SQA4|CD97_BOVIN</Hit_id>
  <Hit_def>CD97 antigen OS=Bos taurus GN=CD97 PE=2 SV=1</Hit_def>
  <Hit_accession>Q8SQA4</Hit_accession>

From the sequence feature info page:

Match Name E-value Identity Description
CD97 3.587360e-6 27.63 antigen OS=Bos taurus GN=CD97 PE=2 SV=1 [more]


Thanks,
Sofia


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal






------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal