Training SNAP in absense of ESTs?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Training SNAP in absense of ESTs?

Khan, Anar
Hi
 
I don’t have EST data for my fungal species of interest, and I’m currently using EST contigs from a closely related species (86% identity in aligned transcript regions) as alternative EST evidence (altest) and SwissProt as protein evidence. I’m also using parameter files from a (different/less) related species for SNAP, Augustus and FGENESH. I’d like to use the bootstrapping procedure described in the SNAP paper (Korf, BMC Bioinformatics, 2004). I thought the best approach would be to run MAKER using all of the inputs listed above i.e. generate the best predictions possible or at least use all info available, then use the results to retrain SNAP. On running maker2zff on the output, fathom -gene-stats gives me:
 
<a few models with errors detected>
318 sequences
0.521223 avg GC fraction (min=0.463702 max=0.581287)
393 genes (plus=194 minus=199)
0 (0.000000) single-exon
393 (1.000000) multi-exon
341.366943 mean exon (min=3 max=4233)
91.340637 mean intron (min=4 max=1152)
 
Is this sufficient training data (n=393)? Would you recommend a different bootstrapping approach?
 
Any advice would be appreciated!
 
Cheers
Anar
 
 

 


Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately.


 


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Training SNAP in absense of ESTs?

Carson Hinton Holt
Re: [maker-devel] Training SNAP in absense of ESTs? 393 is ok for a fungal species, more is always better, but this is comparable to the number you get get when training SNAP with CEGMA.  I have data for an upcoming MAKER2 paper that shows that SNAP, Augustus, and GeneMark perform as well inside of MAKER2 using completely  incorrect species parameter as they do alone using highly optimized parameter file for C. elegans, D. melanogaster, and A. thaliana (they perform horribly when ran alone using the incorrect file).  This means even using the wrong file they will perform very well inside of MAKER2 as a result of “hints” from the evidence alignments from ESTs and proteins.  Of course they perform even better when using correct parameter files inside of MAKER2.

Thanks,
Carson

On 6/8/11 6:14 PM, "Khan, Anar" <Anar.Khan@...> wrote:

Hi
 
I don’t have EST data for my fungal species of interest, and I’m currently using EST contigs from a closely related species (86% identity in aligned transcript regions) as alternative EST evidence (altest) and SwissProt as protein evidence. I’m also using parameter files from a (different/less) related species for SNAP, Augustus and FGENESH. I’d like to use the bootstrapping procedure described in the SNAP paper (Korf, BMC Bioinformatics, 2004). I thought the best approach would be to run MAKER using all of the inputs listed above i.e. generate the best predictions possible or at least use all info available, then use the results to retrain SNAP. On running maker2zff on the output, fathom -gene-stats gives me:
 
<a few models with errors detected>
318 sequences
0.521223 avg GC fraction (min=0.463702 max=0.581287)
393 genes (plus=194 minus=199)
0 (0.000000) single-exon
393 (1.000000) multi-exon
341.366943 mean exon (min=3 max=4233)
91.340637 mean intron (min=4 max=1152)
 
 Is this sufficient training data (n=393)? Would you recommend a different bootstrapping approach?
 
Any advice would be appreciated!
 
Cheers
Anar
 
 
 


Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately.





Carson Holt
Graduate Student
Yandell Lab
http:/www.yandell-lab.org/
Eccles Institute of Human Genetics
University of Utah

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Training SNAP in absense of ESTs?

Khan, Anar
Re: [maker-devel] Training SNAP in absense of ESTs?

Hi Carson

 

Thanks very much for your advice, on both this post and my last one re: single_exon. I will rerun MAKER using (1) single_exon and (2) retraining SNAP and hope for a positive effect on the predictions (:

 

Cheers

Anar

 

From: Carson Holt [mailto:[hidden email]]
Sent: Wednesday, 22 June 2011 3:44 a.m.
To: Khan, Anar; [hidden email]
Subject: Re: [maker-devel] Training SNAP in absense of ESTs?

 

393 is ok for a fungal species, more is always better, but this is comparable to the number you get get when training SNAP with CEGMA.  I have data for an upcoming MAKER2 paper that shows that SNAP, Augustus, and GeneMark perform as well inside of MAKER2 using completely  incorrect species parameter as they do alone using highly optimized parameter file for C. elegans, D. melanogaster, and A. thaliana (they perform horribly when ran alone using the incorrect file).  This means even using the wrong file they will perform very well inside of MAKER2 as a result of “hints” from the evidence alignments from ESTs and proteins.  Of course they perform even better when using correct parameter files inside of MAKER2.

Thanks,
Carson

On 6/8/11 6:14 PM, "Khan, Anar" <Anar.Khan@...> wrote:

Hi
 
I don’t have EST data for my fungal species of interest, and I’m currently using EST contigs from a closely related species (86% identity in aligned transcript regions) as alternative EST evidence (altest) and SwissProt as protein evidence. I’m also using parameter files from a (different/less) related species for SNAP, Augustus and FGENESH. I’d like to use the bootstrapping procedure described in the SNAP paper (Korf, BMC Bioinformatics, 2004). I thought the best approach would be to run MAKER using all of the inputs listed above i.e. generate the best predictions possible or at least use all info available, then use the results to retrain SNAP. On running maker2zff on the output, fathom -gene-stats gives me:
 
<a few models with errors detected>
318 sequences
0.521223 avg GC fraction (min=0.463702 max=0.581287)
393 genes (plus=194 minus=199)
0 (0.000000) single-exon
393 (1.000000) multi-exon
341.366943 mean exon (min=3 max=4233)
91.340637 mean intron (min=4 max=1152)
 
 Is this sufficient training data (n=393)? Would you recommend a different bootstrapping approach?
 
Any advice would be appreciated!
 
Cheers
Anar
 
 
 


Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately.


 


Carson Holt
Graduate Student
Yandell Lab
http:/www.yandell-lab.org/
Eccles Institute of Human Genetics
University of Utah


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Training SNAP in absense of ESTs?

Khan, Anar
Re: [maker-devel] Training SNAP in absense of ESTs?

Hi

 

I reran MAKER in my original analysis directory, this time changing 2 parameters in the options control file:

 

single_exon=1

single_length=250

 

and to my surprise I obtained the same results as for the previous run (single_exon=0), checked via counting the total number of maker predictions in gff files and checking output from fathom -gene-stats:

 

fathom genome.ann genome.dna -gene-stats

318 sequences

0.521223 avg GC fraction (min=0.463702 max=0.581287)

393 genes (plus=194 minus=199)

0 (0.000000) single-exon

393 (1.000000) multi-exon

341.366943 mean exon (min=3 max=4233)

91.340637 mean intron (min=4 max=1152)

 

As you can see, fathom tells me no single exon predictions were identified.

 

I’ve checked my maker_opts file to confirm I truly changed the single_* parameters!

 

Are there other parameters which interact with single_exon? Might I have done something silly?

 

I’ve attached my options control file.

 

Cheers!

Anar

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Khan, Anar
Sent: Wednesday, 22 June 2011 10:03 a.m.
To: Carson Holt; [hidden email]
Subject: Re: [maker-devel] Training SNAP in absense of ESTs?

 

Hi Carson

 

Thanks very much for your advice, on both this post and my last one re: single_exon. I will rerun MAKER using (1) single_exon and (2) retraining SNAP and hope for a positive effect on the predictions (:

 

Cheers

Anar

 

From: Carson Holt [mailto:[hidden email]]
Sent: Wednesday, 22 June 2011 3:44 a.m.
To: Khan, Anar; [hidden email]
Subject: Re: [maker-devel] Training SNAP in absense of ESTs?

 

393 is ok for a fungal species, more is always better, but this is comparable to the number you get get when training SNAP with CEGMA.  I have data for an upcoming MAKER2 paper that shows that SNAP, Augustus, and GeneMark perform as well inside of MAKER2 using completely  incorrect species parameter as they do alone using highly optimized parameter file for C. elegans, D. melanogaster, and A. thaliana (they perform horribly when ran alone using the incorrect file).  This means even using the wrong file they will perform very well inside of MAKER2 as a result of “hints” from the evidence alignments from ESTs and proteins.  Of course they perform even better when using correct parameter files inside of MAKER2.

Thanks,
Carson

On 6/8/11 6:14 PM, "Khan, Anar" <Anar.Khan@...> wrote:

Hi
 
I don’t have EST data for my fungal species of interest, and I’m currently using EST contigs from a closely related species (86% identity in aligned transcript regions) as alternative EST evidence (altest) and SwissProt as protein evidence. I’m also using parameter files from a (different/less) related species for SNAP, Augustus and FGENESH. I’d like to use the bootstrapping procedure described in the SNAP paper (Korf, BMC Bioinformatics, 2004). I thought the best approach would be to run MAKER using all of the inputs listed above i.e. generate the best predictions possible or at least use all info available, then use the results to retrain SNAP. On running maker2zff on the output, fathom -gene-stats gives me:
 
<a few models with errors detected>
318 sequences
0.521223 avg GC fraction (min=0.463702 max=0.581287)
393 genes (plus=194 minus=199)
0 (0.000000) single-exon
393 (1.000000) multi-exon
341.366943 mean exon (min=3 max=4233)
91.340637 mean intron (min=4 max=1152)
 
 Is this sufficient training data (n=393)? Would you recommend a different bootstrapping approach?
 
Any advice would be appreciated!
 
Cheers
Anar
 
 
 


Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately.


 


Carson Holt
Graduate Student
Yandell Lab
http:/www.yandell-lab.org/
Eccles Institute of Human Genetics
University of Utah


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

maker_opts.ctl (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Training SNAP in absense of ESTs?

Carson Hinton Holt
Re: [maker-devel] Training SNAP in absense of ESTs? I would have to physically look at the evidence alignments in Apollo.  It may just be that the altESTs that are single exon are not aligning well to an open reading frame.  I can usually manually review the alignments and deconstruct the logic of how MAKER arrived at a given conclusion.  So send me some example GFF3 file :-)

Thanks,
Carson


On 6/29/11 3:45 PM, "Khan, Anar" <Anar.Khan@...> wrote:

Hi
 
I reran MAKER in my original analysis directory, this time changing 2 parameters in the options control file:
 
single_exon=1
single_length=250
 
and to my surprise I obtained the same results as for the previous run (single_exon=0), checked via counting the total number of maker predictions in gff files and checking output from fathom -gene-stats:
 
fathom genome.ann genome.dna -gene-stats
318 sequences
0.521223 avg GC fraction (min=0.463702 max=0.581287)
393 genes (plus=194 minus=199)
0 (0.000000) single-exon
393 (1.000000) multi-exon
341.366943 mean exon (min=3 max=4233)
91.340637 mean intron (min=4 max=1152)
 
As you can see, fathom tells me no single exon predictions were identified.
 
I’ve checked my maker_opts file to confirm I truly changed the single_* parameters!
 
Are there other parameters which interact with single_exon? Might I have done something silly?
 
I’ve attached my options control file.
 
Cheers!
Anar
 
 

From: maker-devel-bounces@... [[hidden email]] On Behalf Of Khan, Anar
Sent: Wednesday, 22 June 2011 10:03 a.m.
To: Carson Holt; maker-devel@...
Subject: Re: [maker-devel] Training SNAP in absense of ESTs?

Hi Carson
 
Thanks very much for your advice, on both this post and my last one re: single_exon. I will rerun MAKER using (1) single_exon and (2) retraining SNAP and hope for a positive effect on the predictions (:
 
Cheers
Anar
 

From: Carson Holt [[hidden email]]
Sent: Wednesday, 22 June 2011 3:44 a.m.
To: Khan, Anar; maker-devel@...
Subject: Re: [maker-devel] Training SNAP in absense of ESTs?

393 is ok for a fungal species, more is always better, but this is comparable to the number you get get when training SNAP with CEGMA.  I have data for an upcoming MAKER2 paper that shows that SNAP, Augustus, and GeneMark perform as well inside of MAKER2 using completely  incorrect species parameter as they do alone using highly optimized parameter file for C. elegans, D. melanogaster, and A. thaliana (they perform horribly when ran alone using the incorrect file).  This means even using the wrong file they will perform very well inside of MAKER2 as a result of “hints” from the evidence alignments from ESTs and proteins.  Of course they perform even better when using correct parameter files inside of MAKER2.

Thanks,
Carson

On 6/8/11 6:14 PM, "Khan, Anar" <Anar.Khan@...> wrote:
Hi
 
I don’t have EST data for my fungal species of interest, and I’m currently using EST contigs from a closely related species (86% identity in aligned transcript regions) as alternative EST evidence (altest) and SwissProt as protein evidence. I’m also using parameter files from a (different/less) related species for SNAP, Augustus and FGENESH. I’d like to use the bootstrapping procedure described in the SNAP paper (Korf, BMC Bioinformatics, 2004). I thought the best approach would be to run MAKER using all of the inputs listed above i.e. generate the best predictions possible or at least use all info available, then use the results to retrain SNAP. On running maker2zff on the output, fathom -gene-stats gives me:
 
<a few models with errors detected>
318 sequences
0.521223 avg GC fraction (min=0.463702 max=0.581287)
393 genes (plus=194 minus=199)
0 (0.000000) single-exon
393 (1.000000) multi-exon
341.366943 mean exon (min=3 max=4233)
91.340637 mean intron (min=4 max=1152)
 
 Is this sufficient training data (n=393)? Would you recommend a different bootstrapping approach?
 
Any advice would be appreciated!
 
Cheers
Anar
 
 
 



Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately.





Carson Holt
Graduate Student
Yandell Lab
http:/www.yandell-lab.org/ <http://www.yandell-lab.org/>
Eccles Institute of Human Genetics
University of Utah


Carson Holt
Graduate Student
Yandell Lab
http:/www.yandell-lab.org/
Eccles Institute of Human Genetics
University of Utah

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Training SNAP in absense of ESTs?

Khan, Anar
Re: [maker-devel] Training SNAP in absense of ESTs?

Hi Carson

 

In getting gff3 ready to send to you, I noticed the gff3 file does contain single exon predictions! In that case I think I was incorrect in using fathom -gene-stats to determine the composition of my predictions. Fathom may be doing some filtering, I will try to find out more. Should I be worried if no single exon genes are used when retraining SNAP? By the by, I still find it surprising that I get identical output no matter whether single_exon is on/off.

 

Cheers

Anar

 

From: Carson Holt [mailto:[hidden email]]
Sent: Tuesday, 5 July 2011 2:45 p.m.
To: Khan, Anar; [hidden email]
Subject: Re: [maker-devel] Training SNAP in absense of ESTs?

 

I would have to physically look at the evidence alignments in Apollo.  It may just be that the altESTs that are single exon are not aligning well to an open reading frame.  I can usually manually review the alignments and deconstruct the logic of how MAKER arrived at a given conclusion.  So send me some example GFF3 file :-)

Thanks,
Carson


On 6/29/11 3:45 PM, "Khan, Anar" <Anar.Khan@...> wrote:

Hi
 
I reran MAKER in my original analysis directory, this time changing 2 parameters in the options control file:
 
single_exon=1
single_length=250
 
and to my surprise I obtained the same results as for the previous run (single_exon=0), checked via counting the total number of maker predictions in gff files and checking output from fathom -gene-stats:
 
fathom genome.ann genome.dna -gene-stats
318 sequences
0.521223 avg GC fraction (min=0.463702 max=0.581287)
393 genes (plus=194 minus=199)
0 (0.000000) single-exon
393 (1.000000) multi-exon
341.366943 mean exon (min=3 max=4233)
91.340637 mean intron (min=4 max=1152)
 
As you can see, fathom tells me no single exon predictions were identified.
 
I’ve checked my maker_opts file to confirm I truly changed the single_* parameters!
 
Are there other parameters which interact with single_exon? Might I have done something silly?
 
I’ve attached my options control file.
 
Cheers!
Anar
 
 

From: maker-devel-bounces@... [[hidden email]] On Behalf Of Khan, Anar
Sent: Wednesday, 22 June 2011 10:03 a.m.
To: Carson Holt; maker-devel@...
Subject: Re: [maker-devel] Training SNAP in absense of ESTs?

Hi Carson
 
Thanks very much for your advice, on both this post and my last one re: single_exon. I will rerun MAKER using (1) single_exon and (2) retraining SNAP and hope for a positive effect on the predictions (:
 
Cheers
Anar
 

From: Carson Holt [[hidden email]]
Sent: Wednesday, 22 June 2011 3:44 a.m.
To: Khan, Anar; maker-devel@...
Subject: Re: [maker-devel] Training SNAP in absense of ESTs?

393 is ok for a fungal species, more is always better, but this is comparable to the number you get get when training SNAP with CEGMA.  I have data for an upcoming MAKER2 paper that shows that SNAP, Augustus, and GeneMark perform as well inside of MAKER2 using completely  incorrect species parameter as they do alone using highly optimized parameter file for C. elegans, D. melanogaster, and A. thaliana (they perform horribly when ran alone using the incorrect file).  This means even using the wrong file they will perform very well inside of MAKER2 as a result of “hints” from the evidence alignments from ESTs and proteins.  Of course they perform even better when using correct parameter files inside of MAKER2.

Thanks,
Carson

On 6/8/11 6:14 PM, "Khan, Anar" <Anar.Khan@...> wrote:
Hi
 
I don’t have EST data for my fungal species of interest, and I’m currently using EST contigs from a closely related species (86% identity in aligned transcript regions) as alternative EST evidence (altest) and SwissProt as protein evidence. I’m also using parameter files from a (different/less) related species for SNAP, Augustus and FGENESH. I’d like to use the bootstrapping procedure described in the SNAP paper (Korf, BMC Bioinformatics, 2004). I thought the best approach would be to run MAKER using all of the inputs listed above i.e. generate the best predictions possible or at least use all info available, then use the results to retrain SNAP. On running maker2zff on the output, fathom -gene-stats gives me:
 
<a few models with errors detected>
318 sequences
0.521223 avg GC fraction (min=0.463702 max=0.581287)
393 genes (plus=194 minus=199)
0 (0.000000) single-exon
393 (1.000000) multi-exon
341.366943 mean exon (min=3 max=4233)
91.340637 mean intron (min=4 max=1152)
 
 Is this sufficient training data (n=393)? Would you recommend a different bootstrapping approach?
 
Any advice would be appreciated!
 
Cheers
Anar
 
 
 



Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately.





Carson Holt
Graduate Student
Yandell Lab
http:/www.yandell-lab.org/ <http://www.yandell-lab.org/>
Eccles Institute of Human Genetics
University of Utah


Carson Holt
Graduate Student
Yandell Lab
http:/www.yandell-lab.org/
Eccles Institute of Human Genetics
University of Utah


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org