Training with ESTs

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Training with ESTs

Anastasia Gioti-2
Hi,
I am running MAKER on the genomes of two closely related fungi, for  
one of which I have EST evidence that i would like to use to train  
snap as described in the manual.
I was planning of first running maker only with EST2genome predictor,  
then use the trained snap from this run to rerun MAKER several times  
after this with snap, augustus and genemark-es. I am not sure however  
if i should include est2genome (and the EST data) as predictor in the  
runs after the first, i.e, would this still improve the models?
Also, i was wondering if it is expected that incorporating est2genome  
and EST evidence in maker after a few runs where MAKER did not use  
this evidence, but only relied on the other predictors (snap-trained  
each time from the previous run, augustus, genemark-es) will actually  
make the gene count drop. This is what I saw, as I obtined EST data  
after I had already run maker. Is thi logic? I am in any case planning  
to start from scratch with the procedure described above, but it would  
be good to understand why this happened.
Many thanks,
Anastasia

Anastasia Gioti
Researcher
[hidden email]





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Training with ESTs

Marvin B Moore

On Nov 2, 2011, at 10:59 AM, Anastasia Gioti wrote:

Hi,
I am running MAKER on the genomes of two closely related fungi, for  
one of which I have EST evidence that i would like to use to train  
snap as described in the manual.
I was planning of first running maker only with EST2genome predictor,  
then use the trained snap from this run to rerun MAKER several times  
after this with snap, augustus and genemark-es. I am not sure however  
if i should include est2genome (and the EST data) as predictor in the  
runs after the first, i.e, would this still improve the models?

No after the initial est2genome run you don't need to (or want to) continue to use est2genome as a predictor when you have other predictors trained (or in training).  You do however want to continue to supply the EST data as evidence.

Also, i was wondering if it is expected that incorporating est2genome  
and EST evidence in maker after a few runs where MAKER did not use  
this evidence, but only relied on the other predictors (snap-trained  
each time from the previous run, augustus, genemark-es) will actually  
make the gene count drop. This is what I saw, as I obtined EST data  
after I had already run maker. Is thi logic? I am in any case planning  
to start from scratch with the procedure described above, but it would  
be good to understand why this happened.

Did you have ESTs as evidence during the interm runs?  How much of a drop in gene count did you see?  Can you share your two maker_opts.ctl files?

Barry

Many thanks,
Anastasia

Anastasia Gioti
Researcher
[hidden email]





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Training with ESTs

Carson Holt-2
In reply to this post by Anastasia Gioti-2
I usually drop the est2genome option after training SANP or Augustus.  The
reason being that not all ESTs are from real genes, there will be some
background transcription.  But even that is minor.  More often est2genome
genes are only partial because few transcripts get sequenced completely
(end to end).  This creates split genes and inflates the gene count.
While the est2genome dataset is good enough for a first round training
set, unless you have deep est2genome dequencing, I would not use it for
the final set.  Once you have SNAP trained, start providing protein
evidence in addition to the ESTs to further augment it's performance.

Thanks,
Carson




On 11-11-02 12:59 PM, "Anastasia Gioti" <[hidden email]>
wrote:

>Hi,
>I am running MAKER on the genomes of two closely related fungi, for
>one of which I have EST evidence that i would like to use to train
>snap as described in the manual.
>I was planning of first running maker only with EST2genome predictor,
>then use the trained snap from this run to rerun MAKER several times
>after this with snap, augustus and genemark-es. I am not sure however
>if i should include est2genome (and the EST data) as predictor in the
>runs after the first, i.e, would this still improve the models?
>Also, i was wondering if it is expected that incorporating est2genome
>and EST evidence in maker after a few runs where MAKER did not use
>this evidence, but only relied on the other predictors (snap-trained
>each time from the previous run, augustus, genemark-es) will actually
>make the gene count drop. This is what I saw, as I obtined EST data
>after I had already run maker. Is thi logic? I am in any case planning
>to start from scratch with the procedure described above, but it would
>be good to understand why this happened.
>Many thanks,
>Anastasia
>
>Anastasia Gioti
>Researcher
>[hidden email]
>
>
>
>
>
>_______________________________________________
>maker-devel mailing list
>[hidden email]
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Training with ESTs

Marvin B Moore
In reply to this post by Marvin B Moore
Hi Anna,

Well, MAKER doesn't necessarily give more weight to ESTs that proteins, but it does require that you have some evidence for an ab initio prediction before it is promoted to an annotation, so in that sense it is giving no weight to the ab initio predictions alone.  That evidence can be either from a protein alignment or from a spliced EST alignment (for eukaryotes).  You do still get the ab initio predictions, they just end up in a different fasta file (with non_overlapping_ab_initio in the name) or as match/match-part features in the GFF3 file.

You can also promote ab initio gene predictions to the final dataset with the keep_preds option in the maker_opts control file.  If your ESTs are from another organism, be sure you are using the altest option so that they will get aligned with tblastx rather than blastn for better cross species alignment of nucleotide sequence. 

I'm not sure I can say why your gene counts dropped with the EST data without seeing the rest of the options set in the two maker_opts.ctl files.  If you want to send those along, I'm happy to take a look.

B

On Nov 3, 2011, at 3:29 AM, Anastasia Gioti wrote:

Hi Barry,
Thanks for help!

Did you have ESTs as evidence during the interm runs?

No, I id not have any ESTs available, so I run MAKER 3 times (training  
snap at each time) then added ESTs and est2genome predictor in the 4th  
and last run.
How much of a drop in gene count did you see?
Well, not much, 80 genes actually, but i am working on a small genome  
(max 3000 genes). The thing is, I always get the feeling when  
inspecting individual cases that maker gives a lot of weight to EST  
data, thus misses genes for which there is protein and abinitio  
evidence but not EST. As the ESTs correspond to a related organism and  
they do not reflect its full transcriptome, I d rather they are not  
taken into account so much, or at least evenly with protein hits. i am  
not sure how maker internally works though, this is just an impression  
I have.
In any case, it feels more logic that I restart annotations where ESTs  
are included in the first run.
Anastasia


Barry

Many thanks,
Anastasia

Anastasia Gioti
Researcher
[hidden email]





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543





Anastasia Gioti
Researcher
[hidden email]






Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Training Augustus

p sz
Hi
First, my apologizes for my poor english
I see this post:
------------------------------------------
Second on that.   I also have some scripts that help convert the intermediate training files that start from the export.zff you generate as part of the SNAP training into GenBank as the ZFF -> GenBank  seems to be still a bit tricky.

I end up converting ZFF to GFF2 and then I  run this through, but it took a little bit of wrangling to get this working and I *think* this script is generic enough but I haven't spent a lot of time trying to make sure this will work.
https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/snap_gff2augstus_gbk.pl

The scripts that come from Augustus seem to fail on the gff or gtf to genbank conversion in my experience.

Jason
-------------------------------
 I'm training augustus and  it din't work. It can be posible a detailed description (included scripts and use) about the cited post? It would helpful for me.
Thanks in advance

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Training with ESTs

Anastasia Gioti
In reply to this post by Marvin B Moore
Thanks Barry,



Well, MAKER doesn't necessarily give more weight to ESTs that proteins, but it does require that you have some evidence for an ab initio prediction before it is promoted to an annotation, so in that sense it is giving no weight to the ab initio predictions alone.  That evidence can be either from a protein alignment or from a spliced EST alignment (for eukaryotes).  You do still get the ab initio predictions, they just end up in a different fasta file (with non_overlapping_ab_initio in the name) or as match/match-part features in the GFF3 file.

You can also promote ab initio gene predictions to the final dataset with the keep_preds option in the maker_opts control file.  If your ESTs are from another organism, be sure you are using the altest option so that they will get aligned with tblastx rather than blastn for better cross species alignment of nucleotide sequence. 

This is useful to know. I was told that proteins do not matter so much, so I needn't use nr datasets for example, but what you say actually convinces me that in abscence of EST data (for one species I only have ESTs from another closely related species), I should maybe increase the proteins file  or play with the keep_preds parameter.
I'm not sure I can say why your gene counts dropped with the EST data without seeing the rest of the options set in the two maker_opts.ctl files.  If you want to send those along, I'm happy to take a look.
I was hoping to solve this be rerunning maker in the right order (1st run with ESTs only, train snap and run maker with all evidence but without est2genome predictor any more, probably a few times, iterativelly training snap). i thought that providing ESTs at the last maker run (as i had done the tome that the count dropped) might have to do with this. If this is not the case, i ll get back to you with the control files.

Another question: is it possible that fasta_merge  gives NO output after the 1st (ESTonly) run on the species where my only available ESTs are actually from another organism (and thus flagged altest). I got a gff file after gff3_merge, but nothing after fasta_merge, in contrast to the species for hich I have EST data, where i got both..I can send over error files and whatever needed to solve this. I
Thanks, 
Anastasia

B

On Nov 3, 2011, at 3:29 AM, Anastasia Gioti wrote:

Hi Barry,
Thanks for help!

Did you have ESTs as evidence during the interm runs?

No, I id not have any ESTs available, so I run MAKER 3 times (training  
snap at each time) then added ESTs and est2genome predictor in the 4th  
and last run.
How much of a drop in gene count did you see?
Well, not much, 80 genes actually, but i am working on a small genome  
(max 3000 genes). The thing is, I always get the feeling when  
inspecting individual cases that maker gives a lot of weight to EST  
data, thus misses genes for which there is protein and abinitio  
evidence but not EST. As the ESTs correspond to a related organism and  
they do not reflect its full transcriptome, I d rather they are not  
taken into account so much, or at least evenly with protein hits. i am  
not sure how maker internally works though, this is just an impression  
I have.
In any case, it feels more logic that I restart annotations where ESTs  
are included in the first run.
Anastasia


Barry

Many thanks,
Anastasia

Anastasia Gioti
Researcher
[hidden email]





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543





Anastasia Gioti
Researcher
[hidden email]






Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Anastasia Gioti
Post-Doc, Evolutionary Biology Department 
Upssala University
Norbyvägen 18D
SE-752 36  UPPSALA
Tel: +46-18-471 2837
Fax: +46-18-471 6310






_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Training with ESTs

Carson Holt-2
Another question: is it possible that fasta_merge  gives NO output after the 1st (ESTonly) run on the species where my only available ESTs are actually from another organism (and thus flagged altest). I got a gff file after gff3_merge, but nothing after fasta_merge, in contrast to the species for hich I have EST data, where i got both..I can send over error files and whatever needed to solve this. 

fasta_merge won't produce results if there are not any fasta entries.  I imagine that you don't get any genes predicted because you only had alt_est files. The est2genome won't produce models direct from alt_est, if I'm not mistaken.

Thanks,
Carson

From: Anastasia Gioti <[hidden email]>
Date: Tue, 8 Nov 2011 19:27:56 +0100
To: Barry Moore <[hidden email]>
Cc: "[hidden email] List" <[hidden email]>, Carson Holt <[hidden email]>, Anastasia Gioti <[hidden email]>
Subject: Re: [maker-devel] Training with ESTs

Thanks Barry,



Well, MAKER doesn't necessarily give more weight to ESTs that proteins, but it does require that you have some evidence for an ab initio prediction before it is promoted to an annotation, so in that sense it is giving no weight to the ab initio predictions alone.  That evidence can be either from a protein alignment or from a spliced EST alignment (for eukaryotes).  You do still get the ab initio predictions, they just end up in a different fasta file (with non_overlapping_ab_initio in the name) or as match/match-part features in the GFF3 file.

You can also promote ab initio gene predictions to the final dataset with the keep_preds option in the maker_opts control file.  If your ESTs are from another organism, be sure you are using the altest option so that they will get aligned with tblastx rather than blastn for better cross species alignment of nucleotide sequence. 

This is useful to know. I was told that proteins do not matter so much, so I needn't use nr datasets for example, but what you say actually convinces me that in abscence of EST data (for one species I only have ESTs from another closely related species), I should maybe increase the proteins file  or play with the keep_preds parameter.
I'm not sure I can say why your gene counts dropped with the EST data without seeing the rest of the options set in the two maker_opts.ctl files.  If you want to send those along, I'm happy to take a look.
I was hoping to solve this be rerunning maker in the right order (1st run with ESTs only, train snap and run maker with all evidence but without est2genome predictor any more, probably a few times, iterativelly training snap). i thought that providing ESTs at the last maker run (as i had done the tome that the count dropped) might have to do with this. If this is not the case, i ll get back to you with the control files.

Another question: is it possible that fasta_merge  gives NO output after the 1st (ESTonly) run on the species where my only available ESTs are actually from another organism (and thus flagged altest). I got a gff file after gff3_merge, but nothing after fasta_merge, in contrast to the species for hich I have EST data, where i got both..I can send over error files and whatever needed to solve this. I
Thanks, 
Anastasia

B

On Nov 3, 2011, at 3:29 AM, Anastasia Gioti wrote:

Hi Barry,
Thanks for help!

Did you have ESTs as evidence during the interm runs?

No, I id not have any ESTs available, so I run MAKER 3 times (training  
snap at each time) then added ESTs and est2genome predictor in the 4th  
and last run.
How much of a drop in gene count did you see?
Well, not much, 80 genes actually, but i am working on a small genome  
(max 3000 genes). The thing is, I always get the feeling when  
inspecting individual cases that maker gives a lot of weight to EST  
data, thus misses genes for which there is protein and abinitio  
evidence but not EST. As the ESTs correspond to a related organism and  
they do not reflect its full transcriptome, I d rather they are not  
taken into account so much, or at least evenly with protein hits. i am  
not sure how maker internally works though, this is just an impression  
I have.
In any case, it feels more logic that I restart annotations where ESTs  
are included in the first run.
Anastasia


Barry

Many thanks,
Anastasia

Anastasia Gioti
Researcher
[hidden email]





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543





Anastasia Gioti
Researcher
[hidden email]






Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Anastasia Gioti
Post-Doc, Evolutionary Biology Department 
Upssala University
Norbyvägen 18D
SE-752 36  UPPSALA
Tel: +46-18-471 2837
Fax: +46-18-471 6310





_______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Training with ESTs

Marvin B Moore
In reply to this post by Anastasia Gioti
Hi Anastasia,

Does the GFF3 file have any genes in it?  Try:

grep -P '\tgene\t' maker.gff3 | less

and see if you get anything.

B

On Nov 8, 2011, at 11:27 AM, Anastasia Gioti wrote:

Thanks Barry,



Well, MAKER doesn't necessarily give more weight to ESTs that proteins, but it does require that you have some evidence for an ab initio prediction before it is promoted to an annotation, so in that sense it is giving no weight to the ab initio predictions alone.  That evidence can be either from a protein alignment or from a spliced EST alignment (for eukaryotes).  You do still get the ab initio predictions, they just end up in a different fasta file (with non_overlapping_ab_initio in the name) or as match/match-part features in the GFF3 file.

You can also promote ab initio gene predictions to the final dataset with the keep_preds option in the maker_opts control file.  If your ESTs are from another organism, be sure you are using the altest option so that they will get aligned with tblastx rather than blastn for better cross species alignment of nucleotide sequence. 

This is useful to know. I was told that proteins do not matter so much, so I needn't use nr datasets for example, but what you say actually convinces me that in abscence of EST data (for one species I only have ESTs from another closely related species), I should maybe increase the proteins file  or play with the keep_preds parameter.
I'm not sure I can say why your gene counts dropped with the EST data without seeing the rest of the options set in the two maker_opts.ctl files.  If you want to send those along, I'm happy to take a look.
I was hoping to solve this be rerunning maker in the right order (1st run with ESTs only, train snap and run maker with all evidence but without est2genome predictor any more, probably a few times, iterativelly training snap). i thought that providing ESTs at the last maker run (as i had done the tome that the count dropped) might have to do with this. If this is not the case, i ll get back to you with the control files.

Another question: is it possible that fasta_merge  gives NO output after the 1st (ESTonly) run on the species where my only available ESTs are actually from another organism (and thus flagged altest). I got a gff file after gff3_merge, but nothing after fasta_merge, in contrast to the species for hich I have EST data, where i got both..I can send over error files and whatever needed to solve this. I
Thanks, 
Anastasia

B

On Nov 3, 2011, at 3:29 AM, Anastasia Gioti wrote:

Hi Barry,
Thanks for help!

Did you have ESTs as evidence during the interm runs?

No, I id not have any ESTs available, so I run MAKER 3 times (training  
snap at each time) then added ESTs and est2genome predictor in the 4th  
and last run.
How much of a drop in gene count did you see?
Well, not much, 80 genes actually, but i am working on a small genome  
(max 3000 genes). The thing is, I always get the feeling when  
inspecting individual cases that maker gives a lot of weight to EST  
data, thus misses genes for which there is protein and abinitio  
evidence but not EST. As the ESTs correspond to a related organism and  
they do not reflect its full transcriptome, I d rather they are not  
taken into account so much, or at least evenly with protein hits. i am  
not sure how maker internally works though, this is just an impression  
I have.
In any case, it feels more logic that I restart annotations where ESTs  
are included in the first run.
Anastasia


Barry

Many thanks,
Anastasia

Anastasia Gioti
Researcher
[hidden email]





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543





Anastasia Gioti
Researcher
[hidden email]






Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Anastasia Gioti
Post-Doc, Evolutionary Biology Department 
Upssala University
Norbyvägen 18D
SE-752 36  UPPSALA
Tel: +46-18-471 2837
Fax: +46-18-471 6310






Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Training with ESTs

Anastasia Gioti
No it does not. I guess it is what I was suspecting and Carlson  
pointed out, using altest does not create models. Thanks,
Anastasia
On Nov 9, 2011, at 5:14 AM, Barry Moore wrote:

> Hi Anastasia,
>
> Does the GFF3 file have any genes in it?  Try:
>
> grep -P '\tgene\t' maker.gff3 | less
>
> and see if you get anything.
>
> B
>
> On Nov 8, 2011, at 11:27 AM, Anastasia Gioti wrote:
>
>> Thanks Barry,
>>
>>
>>>
>>> Well, MAKER doesn't necessarily give more weight to ESTs that  
>>> proteins, but it does require that you have some evidence for an  
>>> ab initio prediction before it is promoted to an annotation, so in  
>>> that sense it is giving no weight to the ab initio predictions  
>>> alone.  That evidence can be either from a protein alignment or  
>>> from a spliced EST alignment (for eukaryotes).  You do still get  
>>> the ab initio predictions, they just end up in a different fasta  
>>> file (with non_overlapping_ab_initio in the name) or as match/
>>> match-part features in the GFF3 file.
>>>
>>> You can also promote ab initio gene predictions to the final  
>>> dataset with the keep_preds option in the maker_opts control  
>>> file.  If your ESTs are from another organism, be sure you are  
>>> using the altest option so that they will get aligned with tblastx  
>>> rather than blastn for better cross species alignment of  
>>> nucleotide sequence.
>>>
>> This is useful to know. I was told that proteins do not matter so  
>> much, so I needn't use nr datasets for example, but what you say  
>> actually convinces me that in abscence of EST data (for one species  
>> I only have ESTs from another closely related species), I should  
>> maybe increase the proteins file  or play with the keep_preds  
>> parameter.
>>> I'm not sure I can say why your gene counts dropped with the EST  
>>> data without seeing the rest of the options set in the two  
>>> maker_opts.ctl files.  If you want to send those along, I'm happy  
>>> to take a look.
>> I was hoping to solve this be rerunning maker in the right order  
>> (1st run with ESTs only, train snap and run maker with all evidence  
>> but without est2genome predictor any more, probably a few times,  
>> iterativelly training snap). i thought that providing ESTs at the  
>> last maker run (as i had done the tome that the count dropped)  
>> might have to do with this. If this is not the case, i ll get back  
>> to you with the control files.
>>
>> Another question: is it possible that fasta_merge  gives NO output  
>> after the 1st (ESTonly) run on the species where my only available  
>> ESTs are actually from another organism (and thus flagged altest).  
>> I got a gff file after gff3_merge, but nothing after fasta_merge,  
>> in contrast to the species for hich I have EST data, where i got  
>> both..I can send over error files and whatever needed to solve  
>> this. I
>> Thanks,
>> Anastasia
>>>
>>> B
>>>
>>> On Nov 3, 2011, at 3:29 AM, Anastasia Gioti wrote:
>>>
>>>> Hi Barry,
>>>> Thanks for help!
>>>>>
>>>>> Did you have ESTs as evidence during the interm runs?
>>>>
>>>> No, I id not have any ESTs available, so I run MAKER 3 times  
>>>> (training
>>>> snap at each time) then added ESTs and est2genome predictor in  
>>>> the 4th
>>>> and last run.
>>>>> How much of a drop in gene count did you see?
>>>> Well, not much, 80 genes actually, but i am working on a small  
>>>> genome
>>>> (max 3000 genes). The thing is, I always get the feeling when
>>>> inspecting individual cases that maker gives a lot of weight to EST
>>>> data, thus misses genes for which there is protein and abinitio
>>>> evidence but not EST. As the ESTs correspond to a related  
>>>> organism and
>>>> they do not reflect its full transcriptome, I d rather they are not
>>>> taken into account so much, or at least evenly with protein hits.  
>>>> i am
>>>> not sure how maker internally works though, this is just an  
>>>> impression
>>>> I have.
>>>> In any case, it feels more logic that I restart annotations where  
>>>> ESTs
>>>> are included in the first run.
>>>> Anastasia
>>>>>
>>>>>
>>>>> Barry
>>>>>
>>>>>> Many thanks,
>>>>>> Anastasia
>>>>>>
>>>>>> Anastasia Gioti
>>>>>> Researcher
>>>>>> [hidden email]
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> [hidden email]
>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>
>>>>> Barry Moore
>>>>> Research Scientist
>>>>> Dept. of Human Genetics
>>>>> University of Utah
>>>>> Salt Lake City, UT 84112
>>>>> --------------------------------------------
>>>>> (801) 585-3543
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> Anastasia Gioti
>>>> Researcher
>>>> [hidden email]
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> Barry Moore
>>> Research Scientist
>>> Dept. of Human Genetics
>>> University of Utah
>>> Salt Lake City, UT 84112
>>> --------------------------------------------
>>> (801) 585-3543
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> [hidden email]
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>> Anastasia Gioti
>> Post-Doc, Evolutionary Biology Department
>> Upssala University
>> Norbyvägen 18D
>> SE-752 36  UPPSALA
>> [hidden email]
>> Tel: +46-18-471 2837
>> Fax: +46-18-471 6310
>>
>> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/ 
>> Gioti_Anastasia/
>>
>>
>>
>>
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>

Anastasia Gioti
Post-Doc, Evolutionary Biology Department
Uppsala University
Norbyvägen 18D,
SE-75236 UPPSALA
[hidden email]
Tel: +46-18-471-2837
Fax:+46-18-471-6310

http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/







_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Training with ESTs

Anastasia Gioti-2
In reply to this post by Marvin B Moore

Hi again,
I have one more question regarding the annotation with training  
through EST data :
I am interested in understanding why the annotations on a species  
(ESTs available) are higher in gene counts as opposed to the closely  
related species (ESTs na, I am using the ESTs from the  first species  
as altest). As only the second species was annotated with MAKER, I  
decided to launch the same pipeline for both.

I now see that a first run with ESTs-only as evidence and est2genome  
as the sole predictor will only produce results for the 1st species,  
as no model can be built on altest. So i guess i need to rerun with  
snap+est2genome for both species. However, i am wondering, will the  
fact that I only have altest  for the second species systematically  
bias the annotations in favor of the first? Even after several  
training-snap rounds, will I always get fewer gene models for this  
species? I had never thought about it till now, but I noticed that  
previous annotations on 4 genomes of another genus also gave higher  
gene counts for the only species for which i have EST data.

Thank you for your insights so far on MAKER, it is a great tool and it  
is very valuable that you reply to users' questions so that we get a  
deeper understanding off how it works.
Cheers,
Anastasia


> Hi Anastasia,
>
> Does the GFF3 file have any genes in it?  Try:
>
> grep -P '\tgene\t' maker.gff3 | less
>
> and see if you get anything.
>
> B
>
> On Nov 8, 2011, at 11:27 AM, Anastasia Gioti wrote:
>
>> Thanks Barry,
>>
>>
>>>
>>> Well, MAKER doesn't necessarily give more weight to ESTs that  
>>> proteins, but it does require that you have some evidence for an  
>>> ab initio prediction before it is promoted to an annotation, so in  
>>> that sense it is giving no weight to the ab initio predictions  
>>> alone.  That evidence can be either from a protein alignment or  
>>> from a spliced EST alignment (for eukaryotes).  You do still get  
>>> the ab initio predictions, they just end up in a different fasta  
>>> file (with non_overlapping_ab_initio in the name) or as match/
>>> match-part features in the GFF3 file.
>>>
>>> You can also promote ab initio gene predictions to the final  
>>> dataset with the keep_preds option in the maker_opts control  
>>> file.  If your ESTs are from another organism, be sure you are  
>>> using the altest option so that they will get aligned with tblastx  
>>> rather than blastn for better cross species alignment of  
>>> nucleotide sequence.
>>>
>> This is useful to know. I was told that proteins do not matter so  
>> much, so I needn't use nr datasets for example, but what you say  
>> actually convinces me that in abscence of EST data (for one species  
>> I only have ESTs from another closely related species), I should  
>> maybe increase the proteins file  or play with the keep_preds  
>> parameter.
>>> I'm not sure I can say why your gene counts dropped with the EST  
>>> data without seeing the rest of the options set in the two  
>>> maker_opts.ctl files.  If you want to send those along, I'm happy  
>>> to take a look.
>> I was hoping to solve this be rerunning maker in the right order  
>> (1st run with ESTs only, train snap and run maker with all evidence  
>> but without est2genome predictor any more, probably a few times,  
>> iterativelly training snap). i thought that providing ESTs at the  
>> last maker run (as i had done the tome that the count dropped)  
>> might have to do with this. If this is not the case, i ll get back  
>> to you with the control files.
>>
>> Another question: is it possible that fasta_merge  gives NO output  
>> after the 1st (ESTonly) run on the species where my only available  
>> ESTs are actually from another organism (and thus flagged altest).  
>> I got a gff file after gff3_merge, but nothing after fasta_merge,  
>> in contrast to the species for hich I have EST data, where i got  
>> both..I can send over error files and whatever needed to solve  
>> this. I
>> Thanks,
>> Anastasia
>>>
>>> B
>>>
>>> On Nov 3, 2011, at 3:29 AM, Anastasia Gioti wrote:
>>>
>>>> Hi Barry,
>>>> Thanks for help!
>>>>>
>>>>> Did you have ESTs as evidence during the interm runs?
>>>>
>>>> No, I id not have any ESTs available, so I run MAKER 3 times  
>>>> (training
>>>> snap at each time) then added ESTs and est2genome predictor in  
>>>> the 4th
>>>> and last run.
>>>>> How much of a drop in gene count did you see?
>>>> Well, not much, 80 genes actually, but i am working on a small  
>>>> genome
>>>> (max 3000 genes). The thing is, I always get the feeling when
>>>> inspecting individual cases that maker gives a lot of weight to EST
>>>> data, thus misses genes for which there is protein and abinitio
>>>> evidence but not EST. As the ESTs correspond to a related  
>>>> organism and
>>>> they do not reflect its full transcriptome, I d rather they are not
>>>> taken into account so much, or at least evenly with protein hits.  
>>>> i am
>>>> not sure how maker internally works though, this is just an  
>>>> impression
>>>> I have.
>>>> In any case, it feels more logic that I restart annotations where  
>>>> ESTs
>>>> are included in the first run.
>>>> Anastasia
>>>>>
>>>>>
>>>>> Barry
>>>>>
>>>>>> Many thanks,
>>>>>> Anastasia
>>>>>>
>>>>>> Anastasia Gioti
>>>>>> Researcher
>>>>>> [hidden email]
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> [hidden email]
>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>
>>>>> Barry Moore
>>>>> Research Scientist
>>>>> Dept. of Human Genetics
>>>>> University of Utah
>>>>> Salt Lake City, UT 84112
>>>>> --------------------------------------------
>>>>> (801) 585-3543
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> Anastasia Gioti
>>>> Researcher
>>>> [hidden email]
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> Barry Moore
>>> Research Scientist
>>> Dept. of Human Genetics
>>> University of Utah
>>> Salt Lake City, UT 84112
>>> --------------------------------------------
>>> (801) 585-3543
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> [hidden email]
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>> Anastasia Gioti
>> Post-Doc, Evolutionary Biology Department
>> Upssala University
>> Norbyvägen 18D
>> SE-752 36  UPPSALA
>> [hidden email]
>> Tel: +46-18-471 2837
>> Fax: +46-18-471 6310
>>
>> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/ 
>> Gioti_Anastasia/
>>
>>
>>
>>
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>

Anastasia Gioti
Researcher
[hidden email]






_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Training with ESTs

Carson Hinton Holt
The species that the ESTs derives from will have more homology to the
dataset by definition.  For other species, even if they have the same
genes, there will be divergence that will cause less to align in general.
Also aligning in multiple reading frames reduces the significance of an
alignment because you increase the probability of a random alignment (so
alignments are required to be longer before they will be kept).  The order
of probability for evidence types is then first EST (no reading frame
adjustment), second Protein (attempt to align in 3 reading frames), and
last is alt_EST (attempt to align in 6 reading frames - 3 for query and 3
for subject).  So alt_est runs slower (more work to do) and is less
significant (more likely to be filtered out).  Both problems are overcome
when the ESTs are from the same species.

Thanks,
Carson


On 11-11-09 5:48 AM, "Anastasia Gioti" <[hidden email]>
wrote:

>
>Hi again,
>I have one more question regarding the annotation with training
>through EST data :
>I am interested in understanding why the annotations on a species
>(ESTs available) are higher in gene counts as opposed to the closely
>related species (ESTs na, I am using the ESTs from the  first species
>as altest). As only the second species was annotated with MAKER, I
>decided to launch the same pipeline for both.
>
>I now see that a first run with ESTs-only as evidence and est2genome
>as the sole predictor will only produce results for the 1st species,
>as no model can be built on altest. So i guess i need to rerun with
>snap+est2genome for both species. However, i am wondering, will the
>fact that I only have altest  for the second species systematically
>bias the annotations in favor of the first? Even after several
>training-snap rounds, will I always get fewer gene models for this
>species? I had never thought about it till now, but I noticed that
>previous annotations on 4 genomes of another genus also gave higher
>gene counts for the only species for which i have EST data.
>
>Thank you for your insights so far on MAKER, it is a great tool and it
>is very valuable that you reply to users' questions so that we get a
>deeper understanding off how it works.
>Cheers,
>Anastasia
>
>
>> Hi Anastasia,
>>
>> Does the GFF3 file have any genes in it?  Try:
>>
>> grep -P '\tgene\t' maker.gff3 | less
>>
>> and see if you get anything.
>>
>> B
>>
>> On Nov 8, 2011, at 11:27 AM, Anastasia Gioti wrote:
>>
>>> Thanks Barry,
>>>
>>>
>>>>
>>>> Well, MAKER doesn't necessarily give more weight to ESTs that
>>>> proteins, but it does require that you have some evidence for an
>>>> ab initio prediction before it is promoted to an annotation, so in
>>>> that sense it is giving no weight to the ab initio predictions
>>>> alone.  That evidence can be either from a protein alignment or
>>>> from a spliced EST alignment (for eukaryotes).  You do still get
>>>> the ab initio predictions, they just end up in a different fasta
>>>> file (with non_overlapping_ab_initio in the name) or as match/
>>>> match-part features in the GFF3 file.
>>>>
>>>> You can also promote ab initio gene predictions to the final
>>>> dataset with the keep_preds option in the maker_opts control
>>>> file.  If your ESTs are from another organism, be sure you are
>>>> using the altest option so that they will get aligned with tblastx
>>>> rather than blastn for better cross species alignment of
>>>> nucleotide sequence.
>>>>
>>> This is useful to know. I was told that proteins do not matter so
>>> much, so I needn't use nr datasets for example, but what you say
>>> actually convinces me that in abscence of EST data (for one species
>>> I only have ESTs from another closely related species), I should
>>> maybe increase the proteins file  or play with the keep_preds
>>> parameter.
>>>> I'm not sure I can say why your gene counts dropped with the EST
>>>> data without seeing the rest of the options set in the two
>>>> maker_opts.ctl files.  If you want to send those along, I'm happy
>>>> to take a look.
>>> I was hoping to solve this be rerunning maker in the right order
>>> (1st run with ESTs only, train snap and run maker with all evidence
>>> but without est2genome predictor any more, probably a few times,
>>> iterativelly training snap). i thought that providing ESTs at the
>>> last maker run (as i had done the tome that the count dropped)
>>> might have to do with this. If this is not the case, i ll get back
>>> to you with the control files.
>>>
>>> Another question: is it possible that fasta_merge  gives NO output
>>> after the 1st (ESTonly) run on the species where my only available
>>> ESTs are actually from another organism (and thus flagged altest).
>>> I got a gff file after gff3_merge, but nothing after fasta_merge,
>>> in contrast to the species for hich I have EST data, where i got
>>> both..I can send over error files and whatever needed to solve
>>> this. I
>>> Thanks,
>>> Anastasia
>>>>
>>>> B
>>>>
>>>> On Nov 3, 2011, at 3:29 AM, Anastasia Gioti wrote:
>>>>
>>>>> Hi Barry,
>>>>> Thanks for help!
>>>>>>
>>>>>> Did you have ESTs as evidence during the interm runs?
>>>>>
>>>>> No, I id not have any ESTs available, so I run MAKER 3 times
>>>>> (training
>>>>> snap at each time) then added ESTs and est2genome predictor in
>>>>> the 4th
>>>>> and last run.
>>>>>> How much of a drop in gene count did you see?
>>>>> Well, not much, 80 genes actually, but i am working on a small
>>>>> genome
>>>>> (max 3000 genes). The thing is, I always get the feeling when
>>>>> inspecting individual cases that maker gives a lot of weight to EST
>>>>> data, thus misses genes for which there is protein and abinitio
>>>>> evidence but not EST. As the ESTs correspond to a related
>>>>> organism and
>>>>> they do not reflect its full transcriptome, I d rather they are not
>>>>> taken into account so much, or at least evenly with protein hits.
>>>>> i am
>>>>> not sure how maker internally works though, this is just an
>>>>> impression
>>>>> I have.
>>>>> In any case, it feels more logic that I restart annotations where
>>>>> ESTs
>>>>> are included in the first run.
>>>>> Anastasia
>>>>>>
>>>>>>
>>>>>> Barry
>>>>>>
>>>>>>> Many thanks,
>>>>>>> Anastasia
>>>>>>>
>>>>>>> Anastasia Gioti
>>>>>>> Researcher
>>>>>>> [hidden email]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> [hidden email]
>>>>>>>
>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.
>>>>>>>org
>>>>>>
>>>>>> Barry Moore
>>>>>> Research Scientist
>>>>>> Dept. of Human Genetics
>>>>>> University of Utah
>>>>>> Salt Lake City, UT 84112
>>>>>> --------------------------------------------
>>>>>> (801) 585-3543
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> Anastasia Gioti
>>>>> Researcher
>>>>> [hidden email]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> Barry Moore
>>>> Research Scientist
>>>> Dept. of Human Genetics
>>>> University of Utah
>>>> Salt Lake City, UT 84112
>>>> --------------------------------------------
>>>> (801) 585-3543
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> [hidden email]
>>>>
>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>> Anastasia Gioti
>>> Post-Doc, Evolutionary Biology Department
>>> Upssala University
>>> Norbyvägen 18D
>>> SE-752 36  UPPSALA
>>> [hidden email]
>>> Tel: +46-18-471 2837
>>> Fax: +46-18-471 6310
>>>
>>> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/
>>> Gioti_Anastasia/
>>>
>>>
>>>
>>>
>>
>> Barry Moore
>> Research Scientist
>> Dept. of Human Genetics
>> University of Utah
>> Salt Lake City, UT 84112
>> --------------------------------------------
>> (801) 585-3543
>>
>>
>>
>>
>
>Anastasia Gioti
>Researcher
>[hidden email]
>
>
>
>
>


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org