Quantcast

failed to assign putative gene function

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

failed to assign putative gene function

Quanwei Zhang
Hello:

I am trying to add putative gene function to the predicted gene models. Firstly, I use uniProt/Swiss-Prot protein sequences to build the database. I used canonical and isoform proteins of human, mouse and rat with the script "makeblastdb". Then use "blastp" generated "maker2uni.blastp" whose context is as below.
maker-CasCan_contig_64815-snap-gene-0.0-mRNA-1    sp|Q6P5S2|LEG1H_HUMAN    69.97    303    91    0    1    303    1    303    7e-164    464
snap_masked-CasCan_contig_14203-processed-gene-0.10-mRNA-1    sp|Q91ZA8|NRARP_MOUSE    99.12    114    1    0    1    114    1    114    3e-80    236

After that, I am trying to add the protein homology data to the Maker gff3 and fasta files with maker_functional_gff and maker_functional_fasta, but get the reports as below.

Can't parse details from FASTA header: >sp|Q7Z5M8-2|AB12B_HUMAN Isoform 2 of Protein ABHD12B OS=Homo sapiens GN=ABHD12B

Use of uninitialized value $id in hash element at /public/apps/MAKER/2.31.9/bin/maker_functional_gff line 139, <$IN> line 39.
Use of uninitialized value $id in hash element at /public/apps/MAKER/2.31.9/bin/maker_functional_gff line 141, <$IN> line 39.
Can't parse details from FASTA header: >sp|Q7Z5M8-4|AB12B_HUMAN Isoform 4 of Protein ABHD12B OS=Homo sapiens GN=ABHD12B

Use of uninitialized value $id in hash element at /public/apps/MAKER/2.31.9/bin/maker_functional_gff line 139, <$IN> line 45.
Use of uninitialized value $id in hash element at /public/apps/MAKER/2.31.9/bin/maker_functional_gff line 141, <$IN> line 45.
Can't parse details from FASTA header: >sp|Q7Z5M8-5|AB12B_HUMAN Isoform 5 of Protein ABHD12B OS=Homo sapiens GN=ABHD12B
.....

I am not sure how to deal with this. I followed the command given in the protocol. Any suggestions?

Thanks

Best
Quanwei

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: failed to assign putative gene function

Quanwei Zhang
I just found at the bottom of the output file it include information like below. But I did not get those GFF3 files with gene homolog information added.

>snap_masked-CasCan_contig_27993-processed-gene-0.6-mRNA-1 transcript Name:"Protein of unknown function" offset:0 AED:0.47 eAED:0.47 QI:0|0|0|1|1|1|2|0|84
ATGAAAGACATTGGTACCCCAGAGGCATGGCAGATAATGATGTCCCTCAAGTCTGGACTC
TTGGCAGAGATCACATGGGCTTTAGACACCATTAACATTCTACTGTATGATGACAGCAGC
ATTATGACCTTCAACCTCAGTCAGTTCCCAGGATTGCTAGAGCTCTTTGAGTATGAGGTG
GGTGACCGAAGACAGAGAACTCTACTGGACTCTGGGAGATTCAGTGAAGTGTCTGGTCCA
ACCCCTACAGAG

Thanks

Best
Quanwei

2017-02-13 10:16 GMT-05:00 Quanwei Zhang <[hidden email]>:
Hello:

I am trying to add putative gene function to the predicted gene models. Firstly, I use uniProt/Swiss-Prot protein sequences to build the database. I used canonical and isoform proteins of human, mouse and rat with the script "makeblastdb". Then use "blastp" generated "maker2uni.blastp" whose context is as below.
maker-CasCan_contig_64815-snap-gene-0.0-mRNA-1    sp|Q6P5S2|LEG1H_HUMAN    69.97    303    91    0    1    303    1    303    7e-164    464
snap_masked-CasCan_contig_14203-processed-gene-0.10-mRNA-1    sp|Q91ZA8|NRARP_MOUSE    99.12    114    1    0    1    114    1    114    3e-80    236

After that, I am trying to add the protein homology data to the Maker gff3 and fasta files with maker_functional_gff and maker_functional_fasta, but get the reports as below.

Can't parse details from FASTA header: >sp|Q7Z5M8-2|AB12B_HUMAN Isoform 2 of Protein ABHD12B OS=Homo sapiens GN=ABHD12B

Use of uninitialized value $id in hash element at /public/apps/MAKER/2.31.9/bin/maker_functional_gff line 139, <$IN> line 39.
Use of uninitialized value $id in hash element at /public/apps/MAKER/2.31.9/bin/maker_functional_gff line 141, <$IN> line 39.
Can't parse details from FASTA header: >sp|Q7Z5M8-4|AB12B_HUMAN Isoform 4 of Protein ABHD12B OS=Homo sapiens GN=ABHD12B

Use of uninitialized value $id in hash element at /public/apps/MAKER/2.31.9/bin/maker_functional_gff line 139, <$IN> line 45.
Use of uninitialized value $id in hash element at /public/apps/MAKER/2.31.9/bin/maker_functional_gff line 141, <$IN> line 45.
Can't parse details from FASTA header: >sp|Q7Z5M8-5|AB12B_HUMAN Isoform 5 of Protein ABHD12B OS=Homo sapiens GN=ABHD12B
.....

I am not sure how to deal with this. I followed the command given in the protocol. Any suggestions?

Thanks

Best
Quanwei


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: failed to assign putative gene function

Carson Holt-2
In reply to this post by Quanwei Zhang
You either uses TrEMBL or the UniProtKB Isoform sequence set. Their fasta headers are slightly different and will not be parsed correctly,

For example, here is the header as formatted for the same sequence in the Swiss-prot dataset download —>
>sp|Q7Z5M8|AB12B_HUMAN Protein ABHD12B OS=Homo sapiens GN=ABHD12B PE=2 SV=1

I think you used the  UniProtKB Isoform sequence dataset instead.

—Carson





> On Feb 13, 2017, at 8:16 AM, Quanwei Zhang <[hidden email]> wrote:
>
> Hello:
>
> I am trying to add putative gene function to the predicted gene models. Firstly, I use uniProt/Swiss-Prot protein sequences to build the database. I used canonical and isoform proteins of human, mouse and rat with the script "makeblastdb". Then use "blastp" generated "maker2uni.blastp" whose context is as below.
> maker-CasCan_contig_64815-snap-gene-0.0-mRNA-1    sp|Q6P5S2|LEG1H_HUMAN    69.97    303    91    0    1    303    1    303    7e-164    464
> snap_masked-CasCan_contig_14203-processed-gene-0.10-mRNA-1    sp|Q91ZA8|NRARP_MOUSE    99.12    114    1    0    1    114    1    114    3e-80    236
>
> After that, I am trying to add the protein homology data to the Maker gff3 and fasta files with maker_functional_gff and maker_functional_fasta, but get the reports as below.
>
> Can't parse details from FASTA header: >sp|Q7Z5M8-2|AB12B_HUMAN Isoform 2 of Protein ABHD12B OS=Homo sapiens GN=ABHD12B
>
> Use of uninitialized value $id in hash element at /public/apps/MAKER/2.31.9/bin/maker_functional_gff line 139, <$IN> line 39.
> Use of uninitialized value $id in hash element at /public/apps/MAKER/2.31.9/bin/maker_functional_gff line 141, <$IN> line 39.
> Can't parse details from FASTA header: >sp|Q7Z5M8-4|AB12B_HUMAN Isoform 4 of Protein ABHD12B OS=Homo sapiens GN=ABHD12B
>
> Use of uninitialized value $id in hash element at /public/apps/MAKER/2.31.9/bin/maker_functional_gff line 139, <$IN> line 45.
> Use of uninitialized value $id in hash element at /public/apps/MAKER/2.31.9/bin/maker_functional_gff line 141, <$IN> line 45.
> Can't parse details from FASTA header: >sp|Q7Z5M8-5|AB12B_HUMAN Isoform 5 of Protein ABHD12B OS=Homo sapiens GN=ABHD12B
> .....
>
> I am not sure how to deal with this. I followed the command given in the protocol. Any suggestions?
>
> Thanks
>
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Loading...