Maker not predicting many genes

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Maker not predicting many genes

Valero Jimenez, Claudio

Dear list,

 

I’m trying to annotate a fungal genome, and I’m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

 

Regards,

 

Claudio

 

 


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

maker_opts.log (6K) Download Attachment
SOBA.pdf (280K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Maker not predicting many genes

Carson Hinton Holt
You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR’s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson






From: "Valero Jimenez, Claudio" <[hidden email]>
Date: Monday, February 17, 2014 at 2:23 AM
To: "[hidden email]'" <[hidden email]>
Subject: Maker not predicting many genes

Dear list,

 

I’m trying to annotate a fungal genome, and I’m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

 

Regards,

 

Claudio

 

 


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker not predicting many genes

Carson Holt-2
In reply to this post by Valero Jimenez, Claudio
From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I’d set correct_est_fusion=1 as well.

—Carson


From: Carson Holt <[hidden email]>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <[hidden email]>, "[hidden email]'" <[hidden email]>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR’s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson






From: "Valero Jimenez, Claudio" <[hidden email]>
Date: Monday, February 17, 2014 at 2:23 AM
To: "[hidden email]'" <[hidden email]>
Subject: Maker not predicting many genes

Dear list,

 

I’m trying to annotate a fungal genome, and I’m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

 

Regards,

 

Claudio

 

 

_______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker not predicting many genes

Valero Jimenez, Claudio

Hi Carson,

 

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

 

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

 

Similar thing happens when I try fasta_merge:

 

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

 

I never had this problem before with these commands.

 

 

Regards,

 

Claudio

 

From: Carson Holt [mailto:[hidden email]]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; '[hidden email]'
Subject: Re: [maker-devel] Maker not predicting many genes

 

From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I’d set correct_est_fusion=1 as well.

 

—Carson

 

 

From: Carson Holt <[hidden email]>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <[hidden email]>, "[hidden email]'" <[hidden email]>
Subject: Re: [maker-devel] Maker not predicting many genes

 

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR’s clipped off.

 

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

 

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

 

Thanks,

Carson

 

 

 

 

 

 

From: "Valero Jimenez, Claudio" <[hidden email]>
Date: Monday, February 17, 2014 at 2:23 AM
To: "[hidden email]'" <[hidden email]>
Subject: Maker not predicting many genes

 

Dear list,

 

I’m trying to annotate a fungal genome, and I’m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

 

Regards,

 

Claudio

 

 

_______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker not predicting many genes

Carson Holt-2
You provided a directory rather than a file to the -d option (‘d' stands for datastore log).
You must provide the location of the datastore index log file and not the datastore directory.

Example —> ./dpp_contig.maker.output/dpp_contig_master_datastore_index.log

Thanks,
Carson


From: "Valero Jimenez, Claudio" <[hidden email]>
Date: Wednesday, February 19, 2014 at 1:20 AM
To: Carson Holt <[hidden email]>, Carson Holt <[hidden email]>, "[hidden email]'" <[hidden email]>
Subject: RE: [maker-devel] Maker not predicting many genes

Hi Carson,

 

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

 

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

 

Similar thing happens when I try fasta_merge:

 

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

 

I never had this problem before with these commands.

 

 

Regards,

 

Claudio

 

From: Carson Holt [[hidden email]]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; [hidden email]'
Subject: Re: [maker-devel] Maker not predicting many genes

 

From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I’d set correct_est_fusion=1 as well.

 

—Carson

 

 

From: Carson Holt <[hidden email]>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <[hidden email]>, "[hidden email]'" <[hidden email]>
Subject: Re: [maker-devel] Maker not predicting many genes

 

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR’s clipped off.

 

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

 

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

 

Thanks,

Carson

 

 

 

 

 

 

From: "Valero Jimenez, Claudio" <[hidden email]>
Date: Monday, February 17, 2014 at 2:23 AM
To: "[hidden email]'" <[hidden email]>
Subject: Maker not predicting many genes

 

Dear list,

 

I’m trying to annotate a fungal genome, and I’m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

 

Regards,

 

Claudio

 

 

_______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker not predicting many genes

Daniel Ence
In reply to this post by Valero Jimenez, Claudio
Hi Claudio, 

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

From: maker-devel [[hidden email]] on behalf of Valero Jimenez, Claudio [[hidden email]]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; '[hidden email]'
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Carson,

 

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

 

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

 

Similar thing happens when I try fasta_merge:

 

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

 

I never had this problem before with these commands.

 

 

Regards,

 

Claudio

 

From: Carson Holt [mailto:[hidden email]]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; '[hidden email]'
Subject: Re: [maker-devel] Maker not predicting many genes

 

From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I’d set correct_est_fusion=1 as well.

 

—Carson

 

 

From: Carson Holt <[hidden email]>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <[hidden email]>, "[hidden email]'" <[hidden email]>
Subject: Re: [maker-devel] Maker not predicting many genes

 

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR’s clipped off.

 

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

 

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

 

Thanks,

Carson

 

 

 

 

 

 

From: "Valero Jimenez, Claudio" <[hidden email]>
Date: Monday, February 17, 2014 at 2:23 AM
To: "[hidden email]'" <[hidden email]>
Subject: Maker not predicting many genes

 

Dear list,

 

I’m trying to annotate a fungal genome, and I’m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

 

Regards,

 

Claudio

 

 

_______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker not predicting many genes

Valero Jimenez, Claudio

Hi,

 

Thanks, I had a mistake in the command line!!!

 

Regards,


Claudio

 

From: Daniel Ence [mailto:[hidden email]]
Sent: woensdag 19 februari 2014 17:04
To: Valero Jimenez, Claudio; 'Carson Holt'; Carson Holt; '[hidden email]'
Subject: RE: [maker-devel] Maker not predicting many genes

 

Hi Claudio, 

 

What was the command line you used for gff3_merge?

 

Thanks,

Daniel

 

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330


From: maker-devel [[hidden email]] on behalf of Valero Jimenez, Claudio [[hidden email]]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; '[hidden email]'
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Carson,

 

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

 

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

 

Similar thing happens when I try fasta_merge:

 

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

 

I never had this problem before with these commands.

 

 

Regards,

 

Claudio

 

From: Carson Holt [[hidden email]]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; '[hidden email]'
Subject: Re: [maker-devel] Maker not predicting many genes

 

From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I’d set correct_est_fusion=1 as well.

 

—Carson

 

 

From: Carson Holt <[hidden email]>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <[hidden email]>, "[hidden email]'" <[hidden email]>
Subject: Re: [maker-devel] Maker not predicting many genes

 

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR’s clipped off.

 

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

 

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

 

Thanks,

Carson

 

 

 

 

 

 

From: "Valero Jimenez, Claudio" <[hidden email]>
Date: Monday, February 17, 2014 at 2:23 AM
To: "[hidden email]'" <[hidden email]>
Subject: Maker not predicting many genes

 

Dear list,

 

I’m trying to annotate a fungal genome, and I’m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

 

Regards,

 

Claudio

 

 

_______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker not predicting many genes

Barry Moore-3
In reply to this post by Daniel Ence
<base href="x-msg://37/">Hi Daniel,

Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution.

Thanks,

B

On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote:

Hi Claudio, 

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

From: maker-devel [[hidden email]] on behalf of Valero Jimenez, Claudio [[hidden email]]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; '[hidden email]'
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Carson,

 

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

 

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

 

Similar thing happens when I try fasta_merge:

 

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

 

I never had this problem before with these commands.

 

 

Regards,

 

Claudio

 

From: Carson Holt [mailto:[hidden email]] 
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; '[hidden email]'
Subject: Re: [maker-devel] Maker not predicting many genes

 

From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I’d set correct_est_fusion=1 as well.

 

—Carson

 

 

From: Carson Holt <[hidden email]>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <[hidden email]>, "[hidden email]'" <[hidden email]>
Subject: Re: [maker-devel] Maker not predicting many genes

 

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR’s clipped off.

 

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

 

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

 

Thanks,
Carson

 

 

 

 

 

 

From: "Valero Jimenez, Claudio" <[hidden email]>
Date: Monday, February 17, 2014 at 2:23 AM
To: "[hidden email]'" <[hidden email]>
Subject: Maker not predicting many genes

 

Dear list,

 

I’m trying to annotate a fungal genome, and I’m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

 

Regards,

 

Claudio

 

 

_______________________________________________ maker-devel mailing list [hidden email]http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker not predicting many genes

Carson Hinton Holt
You only need to swap a single character in the script.  Just change the  -e (exists) test to a -f (is file) test.

Thanks,
Carson

From: Barry Moore <[hidden email]>
Date: Wednesday, February 19, 2014 at 11:03 AM
To: Daniel Ence <[hidden email]>
Cc: "Valero Jimenez, Claudio" <[hidden email]>, Carson Holt <[hidden email]>, Carson Holt <[hidden email]>, "[hidden email]'" <[hidden email]>
Subject: Re: [maker-devel] Maker not predicting many genes

<base href="x-msg://37/">
Hi Daniel,

Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution.

Thanks,

B

On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote:

Hi Claudio, 

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

From: maker-devel [[hidden email]] on behalf of Valero Jimenez, Claudio [[hidden email]]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; '[hidden email]'
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Carson,

 

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

 

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

 

Similar thing happens when I try fasta_merge:

 

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

 

I never had this problem before with these commands.

 

 

Regards,

 

Claudio

 

From: Carson Holt [[hidden email]] 
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; '[hidden email]'
Subject: Re: [maker-devel] Maker not predicting many genes

 

From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I’d set correct_est_fusion=1 as well.

 

—Carson

 

 

From: Carson Holt <[hidden email]>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <[hidden email]>, "[hidden email]'" <[hidden email]>
Subject: Re: [maker-devel] Maker not predicting many genes

 

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR’s clipped off.

 

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

 

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

 

Thanks,
Carson

 

 

 

 

 

 

From: "Valero Jimenez, Claudio" <[hidden email]>
Date: Monday, February 17, 2014 at 2:23 AM
To: "[hidden email]'" <[hidden email]>
Subject: Maker not predicting many genes

 

Dear list,

 

I’m trying to annotate a fungal genome, and I’m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

 

Regards,

 

Claudio

 

 

_______________________________________________ maker-devel mailing list [hidden email]http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org