InterPro Annotations

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

InterPro Annotations

katebush-2
Hello,

I'm working to incorporate interpro run into my MAKER/Gbrowse data and
am wondering if there's any documentation on how to do this.  I found
the scripts ipr2gff3 which seems to work and produce a gff with ipr
domains that I assume can be directly uploaded as a Gbrowse track.  If
I want to add the GO annotations to the attributes of my MAKER gene
calls, I should use the ipr_update_gff script?  For some reason that
seems to give me some errors (below) and no output (maybe my file
formats-see below)...but is this the correct approach?  Looks like
it's possibly it's not correctly creating %gene_map, is that hash
created from the interpro results file?

thanks!

Kathryn

ipr output files look like this:

genemark-NODE_100_length_74057_cov_12.711493-abinit-gene-0.187-
mRNA-1   353ED0DB4C62A2AF        687     PatternScan     PS00463
ZN2_CY6_FUNGAL_1        135     163     NA      T       03-
Dec-2010     IPR001138       Fungal transcriptional regulatory
protein, N-terminal   Molecular Function: transcription factor
activity (GO:0003700), Cellular Component: nucleus (GO:0005634),
Biological Process: regulation of transcription, DNA-dependent (GO:
0006355), Molecular Function: zinc ion binding (GO:0008270)
genemark-NODE_100_length_74057_cov_12.711493-abinit-
gene-0.187-                 HMMPfam PF02178 AT_hook 107     119
0.031   T       03-Dec-2010     IPR017956       AT hook, DNA-binding
motif      Molecular Function: DNA binding (GO:0003677)
genemark-NODE_100_length_74057_cov_12.711493-abinit-
gene-0.187-                 HMMPfam PF00172 Zn_clus 134     173
2.8e-09 T       03-Dec-2010     IPR001138       Fungal transcriptional
regulatory protein, N-terminal   Molecular Function: transcription
factor activity (GO:0003700), Cellular Component: nucleus (GO:
0005634), Biological Process: regulation of transcription, DNA-
dependent (GO:0006355), Molecular Function: zinc ion binding (GO:
0008270)


Error:

Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 157, <$IN> line 453.
Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 159, <$IN> line 453.
Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 161, <$IN> line 453.
Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 157, <$IN> line 454.

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: InterPro Annotations

Carson Hinton Holt
Re: [maker-devel] InterPro Annotations Yes,  that is correct.

The usage for ipr_update_gff is -->

ipr_update_gff <file.gff3> <iprscan.out>

Where file.gff3 is a MAKER produced gff3 file (not the gff3 file produced by iprscan2gff3, although I’m sure you knew that), and the iprscan.out file is an interproscan report in raw format.

Also make sure your version of MAKER is up to date, as some of the lines indicated in your error message correspond to blank lines in the current version code (so I know your version of ipr_update_gff must be different than my version).

The second number in the error message (i.e.  <$IN> line 453) corresponds to the line in your interproscan report being read when the error occured.  Open the file and check that those line is properly formatted, and that the sequence id indicated in the report also exists exactly the same in the GFF3 file (interproscan can munge names, and weirdly capitalize letters).  I have a program I included in MAKER called iprscan_wrap; it runs interproscan for you and handles many of iprscans weird issues nicely.  It also makes interproscan restartable, and has auto-retry.

I’ve attached a Gbrowse screen view of how iprscan2gff3 results will appear as physical features when concatenated with the MAKER gff3 file (gff3_merge script), and when you click on a gene model, the interproscan results and GO terms are associated with the gene models after running the ipr_update_gff script.

Thanks,
Carson




On 1/18/11 9:00 PM, "katebush" <kbushley@...> wrote:

Hello,

I'm working to incorporate interpro run into my MAKER/Gbrowse data and
am wondering if there's any documentation on how to do this.  I found
the scripts ipr2gff3 which seems to work and produce a gff with ipr
domains that I assume can be directly uploaded as a Gbrowse track.  If
I want to add the GO annotations to the attributes of my MAKER gene
calls, I should use the ipr_update_gff script?  For some reason that
seems to give me some errors (below) and no output (maybe my file
formats-see below)...but is this the correct approach?  Looks like
it's possibly it's not correctly creating %gene_map, is that hash
created from the interpro results file?

thanks!

Kathryn

ipr output files look like this:

genemark-NODE_100_length_74057_cov_12.711493-abinit-gene-0.187-
mRNA-1   353ED0DB4C62A2AF        687     PatternScan     PS00463
ZN2_CY6_FUNGAL_1        135     163     NA      T       03-
Dec-2010     IPR001138       Fungal transcriptional regulatory
protein, N-terminal   Molecular Function: transcription factor
activity (GO:0003700), Cellular Component: nucleus (GO:0005634),
Biological Process: regulation of transcription, DNA-dependent (GO:
0006355), Molecular Function: zinc ion binding (GO:0008270)
genemark-NODE_100_length_74057_cov_12.711493-abinit-
gene-0.187-                 HMMPfam PF02178 AT_hook 107     119
0.031   T       03-Dec-2010     IPR017956       AT hook, DNA-binding
motif      Molecular Function: DNA binding (GO:0003677)
genemark-NODE_100_length_74057_cov_12.711493-abinit-
gene-0.187-                 HMMPfam PF00172 Zn_clus 134     173
2.8e-09 T       03-Dec-2010     IPR001138       Fungal transcriptional
regulatory protein, N-terminal   Molecular Function: transcription
factor activity (GO:0003700), Cellular Component: nucleus (GO:
0005634), Biological Process: regulation of transcription, DNA-
dependent (GO:0006355), Molecular Function: zinc ion binding (GO:
0008270)


Error:

Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 157, <$IN> line 453.
Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 159, <$IN> line 453.
Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 161, <$IN> line 453.
Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 157, <$IN> line 454.

_______________________________________________
maker-devel mailing list
maker-devel@...
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: InterPro Annotations

Sujai Kumar
Hi Carson and Kate

Thanks for posting this discussion on how to use ipr_update_gff

I am using Maker 2.08 and 2 input files: GFF3 for a long contig from maker directly (attached), as well as interproscan raw format output run on that same contig  (13 kbp). I get the same error as Kate was getting (with a different program line number), and no output (the gff3 file gets its timestamp changed but diff shows no difference from before running ipr_update_gff).

As Carson suggested, I checked the sequence IDs in the two files (maker GFF3 and iprscan raw output) and I see the two are different:

298339_countedlength_13412 vs 298339_countedlength_13412_1_ORF2

That would explain it, I guess. However, I don't see how they will ever have the same IDs if interproscan has to work on the translations of the nucleotide sequences that maker works on. I must be missing some basic step in my head.

Any help would be appreciated

Thanks in advance!

- Sujai 
 

Here is the command I used (both files attached):

/software/maker/maker-2.08/bin/ipr_update_gff 298339_countedlength_13412.gff3 test.fna.iprscan.raw 
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 161, <$IN> line 1.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 163, <$IN> line 1.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 161, <$IN> line 2.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 163, <$IN> line 2.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 165, <$IN> line 2.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 165, <$IN> line 2.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 161, <$IN> line 3.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 163, <$IN> line 3.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 165, <$IN> line 3.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 165, <$IN> line 3.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 161, <$IN> line 4.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 163, <$IN> line 4.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 165, <$IN> line 4.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 165, <$IN> line 4.

On Thursday, January 20, 2011 11:04:59 PM UTC, Carson Holt wrote:
Yes,  that is correct.

The usage for ipr_update_gff is -->

ipr_update_gff <file.gff3> <iprscan.out>

Where file.gff3 is a MAKER produced gff3 file (not the gff3 file produced by iprscan2gff3, although I’m sure you knew that), and the iprscan.out file is an interproscan report in raw format.

Also make sure your version of MAKER is up to date, as some of the lines indicated in your error message correspond to blank lines in the current version code (so I know your version of ipr_update_gff must be different than my version).

The second number in the error message (i.e.  <$IN> line 453) corresponds to the line in your interproscan report being read when the error occured.  Open the file and check that those line is properly formatted, and that the sequence id indicated in the report also exists exactly the same in the GFF3 file (interproscan can munge names, and weirdly capitalize letters).  I have a program I included in MAKER called iprscan_wrap; it runs interproscan for you and handles many of iprscans weird issues nicely.  It also makes interproscan restartable, and has auto-retry.

I’ve attached a Gbrowse screen view of how iprscan2gff3 results will appear as physical features when concatenated with the MAKER gff3 file (gff3_merge script), and when you click on a gene model, the interproscan results and GO terms are associated with the gene models after running the ipr_update_gff script.

Thanks,
Carson




On 1/18/11 9:00 PM, "katebush" <kbus...@...> wrote:

Hello,

I'm working to incorporate interpro run into my MAKER/Gbrowse data and
am wondering if there's any documentation on how to do this.  I found
the scripts ipr2gff3 which seems to work and produce a gff with ipr
domains that I assume can be directly uploaded as a Gbrowse track.  If
I want to add the GO annotations to the attributes of my MAKER gene
calls, I should use the ipr_update_gff script?  For some reason that
seems to give me some errors (below) and no output (maybe my file
formats-see below)...but is this the correct approach?  Looks like
it's possibly it's not correctly creating %gene_map, is that hash
created from the interpro results file?

thanks!

Kathryn

ipr output files look like this:

genemark-NODE_100_length_74057_cov_12.711493-abinit-gene-0.187-
mRNA-1   353ED0DB4C62A2AF        687     PatternScan     PS00463
ZN2_CY6_FUNGAL_1        135     163     NA      T       03-
Dec-2010     IPR001138       Fungal transcriptional regulatory
protein, N-terminal   Molecular Function: transcription factor
activity (GO:0003700), Cellular Component: nucleus (GO:0005634),
Biological Process: regulation of transcription, DNA-dependent (GO:
0006355), Molecular Function: zinc ion binding (GO:0008270)
genemark-NODE_100_length_74057_cov_12.711493-abinit-
gene-0.187-                 HMMPfam PF02178 AT_hook 107     119
0.031   T       03-Dec-2010     IPR017956       AT hook, DNA-binding
motif      Molecular Function: DNA binding (GO:0003677)
genemark-NODE_100_length_74057_cov_12.711493-abinit-
gene-0.187-                 HMMPfam PF00172 Zn_clus 134     173
2.8e-09 T       03-Dec-2010     IPR001138       Fungal transcriptional
regulatory protein, N-terminal   Molecular Function: transcription
factor activity (GO:0003700), Cellular Component: nucleus (GO:
0005634), Biological Process: regulation of transcription, DNA-
dependent (GO:0006355), Molecular Function: zinc ion binding (GO:
0008270)


Error:

Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 157, <$IN> line 453.
Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 159, <$IN> line 453.
Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 161, <$IN> line 453.
Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 157, <$IN> line 454.

_______________________________________________
maker-devel mailing list
maker...@...
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

gff3_iprscan_2files.zip (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: InterPro Annotations

Carson Hinton Holt
Re: [maker-devel] InterPro Annotations From what I can tell, you ran interproscan against the genomic DNA and not the proteins.  The ipr_update_gff scripts adds domain information to the genome annotations.  You would need to run Interproscan again using the maker.proteins.fasta file, you can collect these for multiple contigs using the fasta_merge scipt that helps collect maker fasta output into a single files.

--Casrson


On 4/7/11 5:11 AM, "Sujai Kumar" <sujaikumar@...> wrote:

Hi Carson and Kate

Thanks for posting this discussion on how to use ipr_update_gff

I am using Maker 2.08 and 2 input files: GFF3 for a long contig from maker directly (attached), as well as interproscan raw format output run on that same contig  (13 kbp). I get the same error as Kate was getting (with a different program line number), and no output (the gff3 file gets its timestamp changed but diff shows no difference from before running ipr_update_gff).

As Carson suggested, I checked the sequence IDs in the two files (maker GFF3 and iprscan raw output) and I see the two are different:

298339_countedlength_13412 vs 298339_countedlength_13412_1_ORF2

That would explain it, I guess. However, I don't see how they will ever have the same IDs if interproscan has to work on the translations of the nucleotide sequences that maker works on. I must be missing some basic step in my head.

Any help would be appreciated

Thanks in advance!

- Sujai
 

Here is the command I used (both files attached):

/software/maker/maker-2.08/bin/ipr_update_gff 298339_countedlength_13412.gff3 test.fna.iprscan.raw
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 161, <$IN> line 1.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 163, <$IN> line 1.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 161, <$IN> line 2.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 163, <$IN> line 2.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 165, <$IN> line 2.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 165, <$IN> line 2.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 161, <$IN> line 3.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 163, <$IN> line 3.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 165, <$IN> line 3.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 165, <$IN> line 3.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 161, <$IN> line 4.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 163, <$IN> line 4.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 165, <$IN> line 4.
Use of uninitialized value in hash element at /software/maker/maker-2.08/bin/ipr_update_gff line 165, <$IN> line 4.

On Thursday, January 20, 2011 11:04:59 PM UTC, Carson Holt wrote:
Yes,  that is correct.

The usage for ipr_update_gff is -->

ipr_update_gff <file.gff3> <iprscan.out>

Where file.gff3 is a MAKER produced gff3 file (not the gff3 file produced by iprscan2gff3, although I’m sure you knew that), and the iprscan.out file is an interproscan report in raw format.

Also make sure your version of MAKER is up to date, as some of the lines indicated in your error message correspond to blank lines in the current version code (so I know your version of ipr_update_gff must be different than my version).

The second number in the error message (i.e.  <$IN> line 453) corresponds to the line in your interproscan report being read when the error occured.  Open the file and check that those line is properly formatted, and that the sequence id indicated in the report also exists exactly the same in the GFF3 file (interproscan can munge names, and weirdly capitalize letters).  I have a program I included in MAKER called iprscan_wrap; it runs interproscan for you and handles many of iprscans weird issues nicely.  It also makes interproscan restartable, and has auto-retry.

I’ve attached a Gbrowse screen view of how iprscan2gff3 results will appear as physical features when concatenated with the MAKER gff3 file (gff3_merge script), and when you click on a gene model, the interproscan results and GO terms are associated with the gene models after running the ipr_update_gff script.

Thanks,
Carson




On 1/18/11 9:00 PM, "katebush" <kbus...@... <http://kbus...@...> > wrote:

Hello,

I'm working to incorporate interpro run into my MAKER/Gbrowse data and
am wondering if there's any documentation on how to do this.  I found
the scripts ipr2gff3 which seems to work and produce a gff with ipr
domains that I assume can be directly uploaded as a Gbrowse track.  If
I want to add the GO annotations to the attributes of my MAKER gene
calls, I should use the ipr_update_gff script?  For some reason that
seems to give me some errors (below) and no output (maybe my file
formats-see below)...but is this the correct approach?  Looks like
it's possibly it's not correctly creating %gene_map, is that hash
created from the interpro results file?

thanks!

Kathryn

ipr output files look like this:

genemark-NODE_100_length_74057_cov_12.711493-abinit-gene-0.187-
mRNA-1   353ED0DB4C62A2AF        687     PatternScan     PS00463
ZN2_CY6_FUNGAL_1        135     163     NA      T       03-
Dec-2010     IPR001138       Fungal transcriptional regulatory
protein, N-terminal   Molecular Function: transcription factor
activity (GO:0003700), Cellular Component: nucleus (GO:0005634),
Biological Process: regulation of transcription, DNA-dependent (GO:
0006355), Molecular Function: zinc ion binding (GO:0008270)
genemark-NODE_100_length_74057_cov_12.711493-abinit-
gene-0.187-                 HMMPfam PF02178 AT_hook 107     119
0.031   T       03-Dec-2010     IPR017956       AT hook, DNA-binding
motif      Molecular Function: DNA binding (GO:0003677)
genemark-NODE_100_length_74057_cov_12.711493-abinit-
gene-0.187-                 HMMPfam PF00172 Zn_clus 134     173
2.8e-09 T       03-Dec-2010     IPR001138       Fungal transcriptional
regulatory protein, N-terminal   Molecular Function: transcription
factor activity (GO:0003700), Cellular Component: nucleus (GO:
0005634), Biological Process: regulation of transcription, DNA-
dependent (GO:0006355), Molecular Function: zinc ion binding (GO:
0008270)


Error:

Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 157, <$IN> line 453.
Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 159, <$IN> line 453.
Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 161, <$IN> line 453.
Use of uninitialized value in hash element at /local/cluster/spatafora/
MAKER_001/maker/bin/ipr_update_gff line 157, <$IN> line 454.

_______________________________________________
maker-devel mailing list
maker...@... <http://maker...@...>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org