Use pass-through system to add missing genes

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Use pass-through system to add missing genes

Anastasia Gioti-2
Hi, 
I  have a set of predicted proteins from the genome of a fungus annotated by MAKER  using EST data from a closely related species and 3 ab initio predictors  (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected.
I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off?
Thanks, 
Anastasia

Anastasia Gioti
Post-doctoral Researcher

[hidden email]
[hidden email]

http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Use pass-through system to add missing genes

Daniel Hughes
For cross-species comparisons you might have be better off including the actual peptide sequences of the other fungi too in the annotation run - I'd be very surprised if you really did get the same result.

dan.


Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge)
-------------------------------------------------------------------------------------
[hidden email]
[hidden email]


2012/4/25 Anastasia Gioti <[hidden email]>
Hi, 
I  have a set of predicted proteins from the genome of a fungus annotated by MAKER  using EST data from a closely related species and 3 ab initio predictors  (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected.
I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off?
Thanks, 
Anastasia


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Use pass-through system to add missing genes

Anastasia Gioti-2
Hi, 
Do you mean that I should have not include the proteins of the closely related species in my fungal protein fasta file that I used as evidence in MAKER? i do not see why... What I have been trying to do now is further 'bias' the annotations in favor of this species, so as to get the missing genes. Can you explain a bit more whta you mean?
Thanks, 
Anastasia
On Apr 25, 2012, at 11:22 AM, Daniel Hughes wrote:

For cross-species comparisons you might have be better off including the actual peptide sequences of the other fungi too in the annotation run - I'd be very surprised if you really did get the same result.

dan.


Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge)
-------------------------------------------------------------------------------------
[hidden email]
[hidden email]


2012/4/25 Anastasia Gioti <[hidden email]>
Hi, 
I  have a set of predicted proteins from the genome of a fungus annotated by MAKER  using EST data from a closely related species and 3 ab initio predictors  (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected.
I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off?
Thanks, 
Anastasia


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



Anastasia Gioti
Post-doctoral Researcher

[hidden email]
[hidden email]

http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Use pass-through system to add missing genes

Daniel Hughes
sorry my bad, i missed the part about you having already included the fungal proteins as fasta ;/ - too early for me.

in that case have you viewed the full gff output for specific instances of such missing proteins in something like apollo to try and work out why maker hasn't made a call at those loci (aed score...)?

dan.

Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge)
-------------------------------------------------------------------------------------
[hidden email]
[hidden email]


2012/4/25 Anastasia Gioti <[hidden email]>
Hi, 
Do you mean that I should have not include the proteins of the closely related species in my fungal protein fasta file that I used as evidence in MAKER? i do not see why... What I have been trying to do now is further 'bias' the annotations in favor of this species, so as to get the missing genes. Can you explain a bit more whta you mean?
Thanks, 
Anastasia

On Apr 25, 2012, at 11:22 AM, Daniel Hughes wrote:

For cross-species comparisons you might have be better off including the actual peptide sequences of the other fungi too in the annotation run - I'd be very surprised if you really did get the same result.

dan.


Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge)
-------------------------------------------------------------------------------------
[hidden email]
[hidden email]


2012/4/25 Anastasia Gioti <[hidden email]>
Hi, 
I  have a set of predicted proteins from the genome of a fungus annotated by MAKER  using EST data from a closely related species and 3 ab initio predictors  (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected.
I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off?
Thanks, 
Anastasia


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org






_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Use pass-through system to add missing genes

Carson Holt-2
In reply to this post by Anastasia Gioti-2
The way you proceed depends on why the genes are not there to begin with.  Are they not there because of a lack of evidence?  If that's the case just adding the new fasta file should do the trick.  Or are they not there because an assembly error makes it impossible to get a logical model for the region (I.e reading frame breaks).  Are there ab initio models already called in those regions that could just be promoted to the annotation tier?  You can test that one by blasting against the nonoverlaping_abinits.fasta files.

For any of the cases described, you can provide the existing annotation set as the input in GFF3 format, and previous models will be maintained preferentially.  If you know which ab initio predictions you want to add (I.e. the ab initio promoting scenario I descibed), you can provide those predictions to the use the pred_gff option and then set keep_preds=1 and they will be maintained even without evidence.  Attached is a script that would make selecting those easier.  It take the MAKER generated GFF3 and a list of predictions to keep (one name per line).  These might be the results of a BLAST analysis for example.  It will then return the GFF3 entries for just those models selected.

If the situation is more complex, just provide more detail, and I am sure we can help you come up with a plan.

Thanks,
Carson

From: Anastasia Gioti <[hidden email]>
Date: Wed, 25 Apr 2012 11:09:36 +0200
To: <[hidden email]>
Subject: [maker-devel] Use pass-through system to add missing genes

Hi, 
I  have a set of predicted proteins from the genome of a fungus annotated by MAKER  using EST data from a closely related species and 3 ab initio predictors  (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected.
I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off?
Thanks, 
Anastasia

_______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

gff3_select (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use pass-through system to add missing genes

Anastasia Gioti-2
Hi Carlson, 
Thanks for your help!

The way you proceed depends on why the genes are not there to begin with.  Are they not there because of a lack of evidence?  

It is a mixture of cases, and I can only look at some examples to say that. There are cases where all 3 used ab initio predictors provide models, there are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus  no model is retained. i guess my default parameters could be responsible for these cases at least.

If that's the case just adding the new fasta file should do the trick.  

which fasta do you refer to? The proteins file I use as evidence contains all proteins i can actually use.

Or are they not there because an assembly error makes it impossible to get a logical model for the region (I.e reading frame breaks).  

This is not the case in general.

Are there ab initio models already called in those regions that could just be promoted to the annotation tier?  You can test that one by blasting against the nonoverlaping_abinits.fasta files.

I have not done this, will do!


For any of the cases described, you can provide the existing annotation set as the input in GFF3 format, and previous models will be maintained preferentially.  

You mean in a new maker run? is this possible with the old maker as well, not maker2, right?

If you know which ab initio predictions you want to add (I.e. the ab initio promoting scenario I descibed), you can provide those predictions to the use the pred_gff option and then set keep_preds=1 and they will be maintained even without evidence.  Attached is a script that would make selecting those easier.  It take the MAKER generated GFF3 and a list of predictions to keep (one name per line).  These might be the results of a BLAST analysis for example.  It will then return the GFF3 entries for just those models selected.

The thing is, for the few cases I have looked at, I cannot really decide which model is the best, and the 3 models from the ab initio predictors do not agree on the exact intron-exon junctions or the start and stop codons.

If the situation is more complex, just provide more detail, and I am sure we can help you come up with a plan.

What i was thinking to do was to provide a gff file of alignments (eg by exonerate) to the proteins of the closely related species that i am  missing, and somehow keep the previous annotations and get the extra ones by this gff file. But how exactly maker should be run to do this I am not sure. if I want to keep the previous annotations I  need the gff file of the last maker run as input, but then how do I discriminate with the exonerate gff file? And which mode of rediction should be on, and with which parameters? You mention keep_preds=1  for the existing annotations, but how do i also promote evidence from alignments on the same way in the same run?
Looks feasible though. Thanks again,
Anastasia

Thanks,
Carson

From: Anastasia Gioti <[hidden email]>
Date: Wed, 25 Apr 2012 11:09:36 +0200
To: <[hidden email]>
Subject: [maker-devel] Use pass-through system to add missing genes

Hi, 
I  have a set of predicted proteins from the genome of a fungus annotated by MAKER  using EST data from a closely related species and 3 ab initio predictors  (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected.
I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off?
Thanks, 
Anastasia

_______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
<gff3_select>

Anastasia Gioti
Post-doctoral Researcher

[hidden email]
[hidden email]

http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Use pass-through system to add missing genes

Barry Moore
Hi Anastasia,

On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote:

Hi Carlson, 
Thanks for your help!

The way you proceed depends on why the genes are not there to begin with.  Are they not there because of a lack of evidence?  

It is a mixture of cases, and I can only look at some examples to say that. There are cases where all 3 used ab initio predictors provide models, there are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus  no model is retained. i guess my default parameters could be responsible for these cases at least.


This doesn't sound right.  If there are predicted models and blastx protein evidence overlapping them you should get a model retained.  I know for the EST evidence that it has to support a splice site before it will be promoted and I can't remember if protein evidence is the same but certainly if you pass back those protein2genome predictions and the original proteins as evidence then they will be retained as models.

If that's the case just adding the new fasta file should do the trick.  

which fasta do you refer to? The proteins file I use as evidence contains all proteins i can actually use.


Yes using the protein fasta from the closely related species as evidence.  I think you said you've already done that right?


Or are they not there because an assembly error makes it impossible to get a logical model for the region (I.e reading frame breaks).  

This is not the case in general.

Are there ab initio models already called in those regions that could just be promoted to the annotation tier?  You can test that one by blasting against the nonoverlaping_abinits.fasta files.

I have not done this, will do!


For any of the cases described, you can provide the existing annotation set as the input in GFF3 format, and previous models will be maintained preferentially.  

You mean in a new maker run? is this possible with the old maker as well, not maker2, right?


Yes, the original MAKER will do this.


If you know which ab initio predictions you want to add (I.e. the ab initio promoting scenario I descibed), you can provide those predictions to the use the pred_gff option and then set keep_preds=1 and they will be maintained even without evidence.  Attached is a script that would make selecting those easier.  It take the MAKER generated GFF3 and a list of predictions to keep (one name per line).  These might be the results of a BLAST analysis for example.  It will then return the GFF3 entries for just those models selected.

The thing is, for the few cases I have looked at, I cannot really decide which model is the best, and the 3 models from the ab initio predictors do not agree on the exact intron-exon junctions or the start and stop codons.

If the situation is more complex, just provide more detail, and I am sure we can help you come up with a plan.

What i was thinking to do was to provide a gff file of alignments (eg by exonerate) to the proteins of the closely related species that i am  missing, and somehow keep the previous annotations and get the extra ones by this gff file. But how exactly maker should be run to do this I am not sure. if I want to keep the previous annotations I  need the gff file of the last maker run as input, but then how do I discriminate with the exonerate gff file? And which mode of rediction should be on, and with which parameters? You mention keep_preds=1  for the existing annotations, but how do i also promote evidence from alignments on the same way in the same run?
Looks feasible though. Thanks again,
Anastasia


Let me just restate what you've said so that I can be sure that I am correct about what you've already done.  You have run Maker with SNAP, Genemark and Augustus using EST from a closely related species (passed to altest) and protein evidence from other fungi.  You are missing about 1,000 genes compared to the species that provided the EST alignments.  You say their is good evidence that these genes exist from the alignments and I assume by this that you mean the EST/protein alignments that Maker produced.

1) Is the closely related fungus annotated and if so have you included it's proteins in the evidence set that you provided to Maker.  If you haven't provided these proteins as evidence to maker then you should do this.  You can re-run maker passing your original models back through like this:

#-----Re-annotation Using MAKER Derived GFF3
genome_gff=original_maker_annotations.gff3
est_pass=1
altest_pass=1
protein_pass=1
rm_pass=1
model_pass=1
pred_pass=1
other_pass=1

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=proteins_from_closely_related.fasta
## OR it sounds like you've already aligned these with exonerate?
protein_gff=proteins_from_closely_related_already_aligned.gff

2) If you've already included those closely related species proteins but still didn't get the 1,000 genes, then take your nonoverlaping_abinits.fasta and blast them directly against your closely related proteins.  Presumably they don't hit too well because if they did they should have been promoted to predictions by Maker the first time, but here you can decide yourself what thresholds to allow to keep the abinit predictions that hit the closely related species proteins.  If you filter you blast hits the way you want and keep the names of the abinit predictions that pass your filter, then use the script Carson attached it it will generate a abinit precidtion GFF file with only the predictions you selected.  You can then pass those predictions back to Maker and force it to keep them and Maker will turn them from predictions (match/match_part) into gene models.

#-----Re-annotation Using MAKER Derived GFF3
genome_gff=original_maker_annotations.gff3
est_pass=1
altest_pass=1
protein_pass=1
rm_pass=1
model_pass=1
pred_pass=0
other_pass=1

#-----Gene Prediction
snaphmm=
gmhmm=
augustus_species=
fgenesh_par_file=
pred_gff=ab_init_predictions_rescued_by_blast.gff

keep_preds=1

Barry

Thanks,
Carson

From: Anastasia Gioti <[hidden email]>
Date: Wed, 25 Apr 2012 11:09:36 +0200
To: <[hidden email]>
Subject: [maker-devel] Use pass-through system to add missing genes

Hi, 
I  have a set of predicted proteins from the genome of a fungus annotated by MAKER  using EST data from a closely related species and 3 ab initio predictors  (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected.
I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off?
Thanks, 
Anastasia

_______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
<gff3_select>

Anastasia Gioti
Post-doctoral Researcher

[hidden email]
[hidden email]

http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Use pass-through system to add missing genes

Carson Holt-2
It is a mixture of cases, and I can only look at some examples to say that. There are cases where all 3 used ab initio predictors provide models, there are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus  no model is retained. i guess my default parameters could be responsible for these cases at least.

The only way you should be able to get BLASTX overlap and still not get a model for the region is if 1.  The protein alignment in in a different reading frame then your models for every single base pair of the alignment (in which case it's not true overlap).  2. The BLASTX HSPs are stacked on each other again and again in weird rearranged overlaps to produce a very deep alignment which would mean this is a repetitive region and is not really a significant alignment.  Otherwise this should not happen unless you have the AED_threshold set to some value where MAKER will ignore genes unless they have a minimum amount of support (by default this option is always off).  The other two possibilities can be tested by just looking at the alignments manually in Apollo.  Also take a look at the AED and eAED values for your missing genes.  Anything below 1 should always be kept by MAKER by default because it has at least some evidence supported.

which fasta do you refer to? The proteins file I use as evidence contains all proteins i can actually use.

If they are already in your current run ignore this.  

Barry provided detailed instructions on how to configure MAKER, for your particular case.  So just follow his excellent instructions.

Thanks,
Carson



From: Barry Moore <[hidden email]>
Date: Friday, 27 April, 2012 7:57 AM
To: Anastasia Gioti <[hidden email]>
Cc: Carson Holt <[hidden email]>, <[hidden email]>
Subject: Re: [maker-devel] Use pass-through system to add missing genes

Hi Anastasia,

On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote:

Hi Carlson, 
Thanks for your help!

The way you proceed depends on why the genes are not there to begin with.  Are they not there because of a lack of evidence?  

It is a mixture of cases, and I can only look at some examples to say that. There are cases where all 3 used ab initio predictors provide models, there are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus  no model is retained. i guess my default parameters could be responsible for these cases at least.


This doesn't sound right.  If there are predicted models and blastx protein evidence overlapping them you should get a model retained.  I know for the EST evidence that it has to support a splice site before it will be promoted and I can't remember if protein evidence is the same but certainly if you pass back those protein2genome predictions and the original proteins as evidence then they will be retained as models.

If that's the case just adding the new fasta file should do the trick.  

which fasta do you refer to? The proteins file I use as evidence contains all proteins i can actually use.


Yes using the protein fasta from the closely related species as evidence.  I think you said you've already done that right?


Or are they not there because an assembly error makes it impossible to get a logical model for the region (I.e reading frame breaks).  

This is not the case in general.

Are there ab initio models already called in those regions that could just be promoted to the annotation tier?  You can test that one by blasting against the nonoverlaping_abinits.fasta files.

I have not done this, will do!


For any of the cases described, you can provide the existing annotation set as the input in GFF3 format, and previous models will be maintained preferentially.  

You mean in a new maker run? is this possible with the old maker as well, not maker2, right?


Yes, the original MAKER will do this.


If you know which ab initio predictions you want to add (I.e. the ab initio promoting scenario I descibed), you can provide those predictions to the use the pred_gff option and then set keep_preds=1 and they will be maintained even without evidence.  Attached is a script that would make selecting those easier.  It take the MAKER generated GFF3 and a list of predictions to keep (one name per line).  These might be the results of a BLAST analysis for example.  It will then return the GFF3 entries for just those models selected.

The thing is, for the few cases I have looked at, I cannot really decide which model is the best, and the 3 models from the ab initio predictors do not agree on the exact intron-exon junctions or the start and stop codons.

If the situation is more complex, just provide more detail, and I am sure we can help you come up with a plan.

What i was thinking to do was to provide a gff file of alignments (eg by exonerate) to the proteins of the closely related species that i am  missing, and somehow keep the previous annotations and get the extra ones by this gff file. But how exactly maker should be run to do this I am not sure. if I want to keep the previous annotations I  need the gff file of the last maker run as input, but then how do I discriminate with the exonerate gff file? And which mode of rediction should be on, and with which parameters? You mention keep_preds=1  for the existing annotations, but how do i also promote evidence from alignments on the same way in the same run?
Looks feasible though. Thanks again,
Anastasia


Let me just restate what you've said so that I can be sure that I am correct about what you've already done.  You have run Maker with SNAP, Genemark and Augustus using EST from a closely related species (passed to altest) and protein evidence from other fungi.  You are missing about 1,000 genes compared to the species that provided the EST alignments.  You say their is good evidence that these genes exist from the alignments and I assume by this that you mean the EST/protein alignments that Maker produced.

1) Is the closely related fungus annotated and if so have you included it's proteins in the evidence set that you provided to Maker.  If you haven't provided these proteins as evidence to maker then you should do this.  You can re-run maker passing your original models back through like this:

#-----Re-annotation Using MAKER Derived GFF3
genome_gff=original_maker_annotations.gff3
est_pass=1
altest_pass=1
protein_pass=1
rm_pass=1
model_pass=1
pred_pass=1
other_pass=1

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=proteins_from_closely_related.fasta
## OR it sounds like you've already aligned these with exonerate?
protein_gff=proteins_from_closely_related_already_aligned.gff

2) If you've already included those closely related species proteins but still didn't get the 1,000 genes, then take your nonoverlaping_abinits.fasta and blast them directly against your closely related proteins.  Presumably they don't hit too well because if they did they should have been promoted to predictions by Maker the first time, but here you can decide yourself what thresholds to allow to keep the abinit predictions that hit the closely related species proteins.  If you filter you blast hits the way you want and keep the names of the abinit predictions that pass your filter, then use the script Carson attached it it will generate a abinit precidtion GFF file with only the predictions you selected.  You can then pass those predictions back to Maker and force it to keep them and Maker will turn them from predictions (match/match_part) into gene models.

#-----Re-annotation Using MAKER Derived GFF3
genome_gff=original_maker_annotations.gff3
est_pass=1
altest_pass=1
protein_pass=1
rm_pass=1
model_pass=1
pred_pass=0
other_pass=1

#-----Gene Prediction
snaphmm=
gmhmm=
augustus_species=
fgenesh_par_file=
pred_gff=ab_init_predictions_rescued_by_blast.gff

keep_preds=1

Barry

Thanks,
Carson

From: Anastasia Gioti <[hidden email]>
Date: Wed, 25 Apr 2012 11:09:36 +0200
To: <[hidden email]>
Subject: [maker-devel] Use pass-through system to add missing genes

Hi, 
I  have a set of predicted proteins from the genome of a fungus annotated by MAKER  using EST data from a closely related species and 3 ab initio predictors  (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected.
I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off?
Thanks, 
Anastasia

_______________________________________________ maker-devel mailing list [hidden email]http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
<gff3_select>


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Use pass-through system to add missing genes

Collett, James R-2
In reply to this post by Anastasia Gioti-2
Hi Carson,

Could you please send me (or make available for download) the perl script that you mentioned in this previous post in this thread?

>> Attached is a
>> script that would make selecting those easier.  It take the MAKER
>> generated GFF3 and a list of predictions to keep (one name per line).  
>> These might be the results of a BLAST analysis for example.  It will
>> then return the GFF3 entries for just those models selected.

Thanks,

Jim
__________________________________________________
James R. Collett, Ph.D.
Senior Scientist
Chemical and Biological Process Development Group
Energy and Environment Directorate
Pacific Northwest National Laboratory

> -----Original Message-----
> From: [hidden email] [mailto:maker-devel-
> [hidden email]] On Behalf Of maker-devel-request@yandell-
> lab.org
> Sent: Friday, April 27, 2012 6:48 AM
> To: [hidden email]
> Subject: maker-devel Digest, Vol 47, Issue 14
>
> Send maker-devel mailing list submissions to
> [hidden email]
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-
> lab.org
>
> or, via email, send a message with subject or body 'help' to
> [hidden email]
>
> You can reach the person managing the list at
> [hidden email]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of maker-devel digest..."
>
>
> Today's Topics:
>
>    1. Re: Use pass-through system to add missing genes (Carson Holt)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 27 Apr 2012 09:27:24 -0400
> From: Carson Holt <[hidden email]>
> To: Barry Moore <[hidden email]>, Anastasia Gioti
> <[hidden email]>
> Cc: [hidden email]
> Subject: Re: [maker-devel] Use pass-through system to add missing
> genes
> Message-ID: <CBC01559.BF45%[hidden email]>
> Content-Type: text/plain; charset="us-ascii"
>
> > It is a mixture of cases, and I can only look at some examples to say
> that.
> > There are cases where all 3 used ab initio predictors provide models,
> > there are blastx hits, or both blastx and protein2 genome, but no EST
> > evidence, thus no model is retained. i guess my default parameters
> > could be responsible for these cases at least.
>
> The only way you should be able to get BLASTX overlap and still not get
> a model for the region is if 1.  The protein alignment in in a
> different reading frame then your models for every single base pair of
> the alignment (in which case it's not true overlap).  2. The BLASTX
> HSPs are stacked on each other again and again in weird rearranged
> overlaps to produce a very deep alignment which would mean this is a
> repetitive region and is not really a significant alignment.  Otherwise
> this should not happen unless you have the AED_threshold set to some
> value where MAKER will ignore genes unless they have a minimum amount
> of support (by default this option is always off).  The other two
> possibilities can be tested by just looking at the alignments manually
> in Apollo.  Also take a look at the AED and eAED values for your
> missing genes.  Anything below 1 should always be kept by MAKER by
> default because it has at least some evidence supported.
>
> > which fasta do you refer to? The proteins file I use as evidence
> > contains all proteins i can actually use.
>
> If they are already in your current run ignore this.
>
> Barry provided detailed instructions on how to configure MAKER, for
> your particular case.  So just follow his excellent instructions.
>
> Thanks,
> Carson
>
>
>
> From:  Barry Moore <[hidden email]>
> Date:  Friday, 27 April, 2012 7:57 AM
> To:  Anastasia Gioti <[hidden email]>
> Cc:  Carson Holt <[hidden email]>, <[hidden email]>
> Subject:  Re: [maker-devel] Use pass-through system to add missing
> genes
>
> Hi Anastasia,
>
> On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote:
>
> > Hi Carlson,
> > Thanks for your help!
> >
> >> The way you proceed depends on why the genes are not there to begin
> with.
> >> Are they not there because of a lack of evidence?
> >
> > It is a mixture of cases, and I can only look at some examples to say
> that.
> > There are cases where all 3 used ab initio predictors provide models,
> > there are blastx hits, or both blastx and protein2 genome, but no EST
> > evidence, thus no model is retained. i guess my default parameters
> > could be responsible for these cases at least.
> >
>
> This doesn't sound right.  If there are predicted models and blastx
> protein evidence overlapping them you should get a model retained.  I
> know for the EST evidence that it has to support a splice site before
> it will be promoted and I can't remember if protein evidence is the
> same but certainly if you pass back those protein2genome predictions
> and the original proteins as evidence then they will be retained as
> models.
>
> >> If that's the case just adding the new fasta file should do the
> trick.
> >
> > which fasta do you refer to? The proteins file I use as evidence
> > contains all proteins i can actually use.
> >
>
> Yes using the protein fasta from the closely related species as
> evidence.  I think you said you've already done that right?
>
>
> >> Or are they not there because an assembly error makes it impossible
> >> to get a logical model for the region (I.e reading frame breaks).
> >
> > This is not the case in general.
> >
> >> Are there ab initio models already called in those regions that
> could
> >> just be promoted to the annotation tier?  You can test that one by
> >> blasting against the nonoverlaping_abinits.fasta files.
> >
> > I have not done this, will do!
> >
> >>
> >> For any of the cases described, you can provide the existing
> >> annotation set as the input in GFF3 format, and previous models will
> >> be maintained preferentially.
> >
> > You mean in a new maker run? is this possible with the old maker as
> > well, not maker2, right?
> >
>
> Yes, the original MAKER will do this.
>
>
> >> If you know which ab initio predictions you want to add (I.e. the ab
> >> initio promoting scenario I descibed), you can provide those
> >> predictions to the use the pred_gff option and then set keep_preds=1
> >> and they will be maintained even without evidence.  Attached is a
> >> script that would make selecting those easier.  It take the MAKER
> >> generated GFF3 and a list of predictions to keep (one name per
> line).
> >> These might be the results of a BLAST analysis for example.  It will
> >> then return the GFF3 entries for just those models selected.
> >
> > The thing is, for the few cases I have looked at, I cannot really
> > decide which model is the best, and the 3 models from the ab initio
> > predictors do not agree on the exact intron-exon junctions or the
> start and stop codons.
> >>
> >> If the situation is more complex, just provide more detail, and I am
> >> sure we can help you come up with a plan.
> >>
> > What i was thinking to do was to provide a gff file of alignments (eg
> > by
> > exonerate) to the proteins of the closely related species that i am
> > missing, and somehow keep the previous annotations and get the extra
> > ones by this gff file. But how exactly maker should be run to do this
> > I am not sure. if I want to keep the previous annotations I  need the
> > gff file of the last maker run as input, but then how do I
> > discriminate with the exonerate gff file? And which mode of rediction
> > should be on, and with which parameters? You mention
> > keep_preds=1  for the existing annotations, but how do i also promote
> > evidence from alignments on the same way in the same run?
> > Looks feasible though. Thanks again,
> > Anastasia
> >
>
> Let me just restate what you've said so that I can be sure that I am
> correct about what you've already done.  You have run Maker with SNAP,
> Genemark and Augustus using EST from a closely related species (passed
> to altest) and protein evidence from other fungi.  You are missing
> about 1,000 genes compared to the species that provided the EST
> alignments.  You say their is good evidence that these genes exist from
> the alignments and I assume by this that you mean the EST/protein
> alignments that Maker produced.
>
> 1) Is the closely related fungus annotated and if so have you included
> it's proteins in the evidence set that you provided to Maker.  If you
> haven't provided these proteins as evidence to maker then you should do
> this.  You can re-run maker passing your original models back through
> like this:
>
> #-----Re-annotation Using MAKER Derived GFF3
> genome_gff=original_maker_annotations.gff3
> est_pass=1
> altest_pass=1
> protein_pass=1
> rm_pass=1
> model_pass=1
> pred_pass=1
> other_pass=1
>
> #-----Protein Homology Evidence (for best results provide a file for at
> least one) protein=proteins_from_closely_related.fasta
> ## OR it sounds like you've already aligned these with exonerate?
> protein_gff=proteins_from_closely_related_already_aligned.gff
>
> 2) If you've already included those closely related species proteins
> but still didn't get the 1,000 genes, then take your
> nonoverlaping_abinits.fasta and blast them directly against your
> closely related proteins.  Presumably they don't hit too well because
> if they did they should have been promoted to predictions by Maker the
> first time, but here you can decide yourself what thresholds to allow
> to keep the abinit predictions that hit the closely related species
> proteins.  If you filter you blast hits the way you want and keep the
> names of the abinit predictions that pass your filter, then use the
> script Carson attached it it will generate a abinit precidtion GFF file
> with only the predictions you selected.  You can then pass those
> predictions back to Maker and force it to keep them and Maker will turn
> them from predictions
> (match/match_part) into gene models.
>
> #-----Re-annotation Using MAKER Derived GFF3
> genome_gff=original_maker_annotations.gff3
> est_pass=1
> altest_pass=1
> protein_pass=1
> rm_pass=1
> model_pass=1
> pred_pass=0
> other_pass=1
>
> #-----Gene Prediction
> snaphmm=
> gmhmm=
> augustus_species=
> fgenesh_par_file=
> pred_gff=ab_init_predictions_rescued_by_blast.gff
>
> keep_preds=1
>
> Barry
>
> >> Thanks,
> >> Carson
> >>
> >> From:  Anastasia Gioti <[hidden email]>
> >> Date:  Wed, 25 Apr 2012 11:09:36 +0200
> >> To:  <[hidden email]>
> >> Subject:  [maker-devel] Use pass-through system to add missing genes
> >>
> >> Hi,
> >> I  have a set of predicted proteins from the genome of a fungus
> >> annotated by MAKER  using EST data from a closely related species
> and
> >> 3 ab initio predictors  (snap iterativelly trained 3 times, genemark
> >> trained directly on the assembly and augustus with a model from a
> >> less closely related species), along with a set of fungal proteins.
> I
> >> am missing ~ 1000 proteins when I compare to the species i used EST
> >> data from, and there is good evidence from alignments that these
> >> genes exist. The question is how to proceed from Blast hits to
> actual
> >> gene models here. The idea would be to add these genes to the
> >> existing dataset, rather than reannotate the genome. I believe that
> >> reannotating it without any further evidence such as RNA-seq from
> the
> >> species itself would not change much,and i d rather stick with
> actual
> >> predictions that i trust and have used in subsequent analyses. The
> >> 1000 genes I can accept to annotate with a less stringent and
> reliable way than MAKER, I just want to add them so that the difference
> in gene count gets corrected.
> >> I was reading the MAKER 2 paper and i was wondering if I can use the
> >> legacy annotations scheme to do it, by providing GFF3 of the
> >> alignments between the two species in the regions where genes were
> >> missed, but as i said, I would not like to reannotate the whole
> >> genome, and running MAKER2 might cause slight changes that i d like
> >> to avoid. Is this possible? First, is it possible to provide a Gff3
> >> file of specific locations and not the entire genome alignment? (I
> >> guess so..) Second, how can I tag the existing annotations as 'not
> to be changed' or alternatively, tag the new models only?
> >> How should I run maker2, with which predictors on and which off?
> >> Thanks,
> >> Anastasia
> >>
> >> Anastasia Gioti
> >> Post-doctoral Researcher
> >>
> >> [hidden email]
> >> [hidden email]
> >>
> >>
> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia
> >> /
> >>
> >>
> >>
> >> _______________________________________________ maker-devel mailing
> >> list
> >> maker-
> [hidden email]://box290.bluehost.com/mailman/lis
> >> tinfo/ma
> >> ker-devel_yandell-lab.org
> >> <gff3_select>
> >
> > Anastasia Gioti
> > Post-doctoral Researcher
> >
> > [hidden email]
> > [hidden email]
> >
> >
> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/
> >
> >
> >
> > _______________________________________________
> > maker-devel mailing list
> > [hidden email]
> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-
> lab.or
> > g
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-
> lab.org/attachments/20120427/72b70d49/attachment.html>
>
> ------------------------------
>
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
> End of maker-devel Digest, Vol 47, Issue 14
> *******************************************

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Use pass-through system to add missing genes

Carson Holt-2
Here you go.  This will also be part of the next MAKER release in some
form.

Thanks,
Carson



On 12-04-27 12:51 PM, "Collett, James R" <[hidden email]> wrote:

>Hi Carson,
>
>Could you please send me (or make available for download) the perl script
>that you mentioned in this previous post in this thread?
>
>>> Attached is a
>>> script that would make selecting those easier.  It take the MAKER
>>> generated GFF3 and a list of predictions to keep (one name per line).
>>> These might be the results of a BLAST analysis for example.  It will
>>> then return the GFF3 entries for just those models selected.
>
>Thanks,
>
>Jim
>__________________________________________________
>James R. Collett, Ph.D.
>Senior Scientist
>Chemical and Biological Process Development Group
>Energy and Environment Directorate
>Pacific Northwest National Laboratory
>
>> -----Original Message-----
>> From: [hidden email] [mailto:maker-devel-
>> [hidden email]] On Behalf Of maker-devel-request@yandell-
>> lab.org
>> Sent: Friday, April 27, 2012 6:48 AM
>> To: [hidden email]
>> Subject: maker-devel Digest, Vol 47, Issue 14
>>
>> Send maker-devel mailing list submissions to
>> [hidden email]
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-
>> lab.org
>>
>> or, via email, send a message with subject or body 'help' to
>> [hidden email]
>>
>> You can reach the person managing the list at
>> [hidden email]
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of maker-devel digest..."
>>
>>
>> Today's Topics:
>>
>>    1. Re: Use pass-through system to add missing genes (Carson Holt)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Fri, 27 Apr 2012 09:27:24 -0400
>> From: Carson Holt <[hidden email]>
>> To: Barry Moore <[hidden email]>, Anastasia Gioti
>> <[hidden email]>
>> Cc: [hidden email]
>> Subject: Re: [maker-devel] Use pass-through system to add missing
>> genes
>> Message-ID: <CBC01559.BF45%[hidden email]>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> > It is a mixture of cases, and I can only look at some examples to say
>> that.
>> > There are cases where all 3 used ab initio predictors provide models,
>> > there are blastx hits, or both blastx and protein2 genome, but no EST
>> > evidence, thus no model is retained. i guess my default parameters
>> > could be responsible for these cases at least.
>>
>> The only way you should be able to get BLASTX overlap and still not get
>> a model for the region is if 1.  The protein alignment in in a
>> different reading frame then your models for every single base pair of
>> the alignment (in which case it's not true overlap).  2. The BLASTX
>> HSPs are stacked on each other again and again in weird rearranged
>> overlaps to produce a very deep alignment which would mean this is a
>> repetitive region and is not really a significant alignment.  Otherwise
>> this should not happen unless you have the AED_threshold set to some
>> value where MAKER will ignore genes unless they have a minimum amount
>> of support (by default this option is always off).  The other two
>> possibilities can be tested by just looking at the alignments manually
>> in Apollo.  Also take a look at the AED and eAED values for your
>> missing genes.  Anything below 1 should always be kept by MAKER by
>> default because it has at least some evidence supported.
>>
>> > which fasta do you refer to? The proteins file I use as evidence
>> > contains all proteins i can actually use.
>>
>> If they are already in your current run ignore this.
>>
>> Barry provided detailed instructions on how to configure MAKER, for
>> your particular case.  So just follow his excellent instructions.
>>
>> Thanks,
>> Carson
>>
>>
>>
>> From:  Barry Moore <[hidden email]>
>> Date:  Friday, 27 April, 2012 7:57 AM
>> To:  Anastasia Gioti <[hidden email]>
>> Cc:  Carson Holt <[hidden email]>, <[hidden email]>
>> Subject:  Re: [maker-devel] Use pass-through system to add missing
>> genes
>>
>> Hi Anastasia,
>>
>> On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote:
>>
>> > Hi Carlson,
>> > Thanks for your help!
>> >
>> >> The way you proceed depends on why the genes are not there to begin
>> with.
>> >> Are they not there because of a lack of evidence?
>> >
>> > It is a mixture of cases, and I can only look at some examples to say
>> that.
>> > There are cases where all 3 used ab initio predictors provide models,
>> > there are blastx hits, or both blastx and protein2 genome, but no EST
>> > evidence, thus no model is retained. i guess my default parameters
>> > could be responsible for these cases at least.
>> >
>>
>> This doesn't sound right.  If there are predicted models and blastx
>> protein evidence overlapping them you should get a model retained.  I
>> know for the EST evidence that it has to support a splice site before
>> it will be promoted and I can't remember if protein evidence is the
>> same but certainly if you pass back those protein2genome predictions
>> and the original proteins as evidence then they will be retained as
>> models.
>>
>> >> If that's the case just adding the new fasta file should do the
>> trick.
>> >
>> > which fasta do you refer to? The proteins file I use as evidence
>> > contains all proteins i can actually use.
>> >
>>
>> Yes using the protein fasta from the closely related species as
>> evidence.  I think you said you've already done that right?
>>
>>
>> >> Or are they not there because an assembly error makes it impossible
>> >> to get a logical model for the region (I.e reading frame breaks).
>> >
>> > This is not the case in general.
>> >
>> >> Are there ab initio models already called in those regions that
>> could
>> >> just be promoted to the annotation tier?  You can test that one by
>> >> blasting against the nonoverlaping_abinits.fasta files.
>> >
>> > I have not done this, will do!
>> >
>> >>
>> >> For any of the cases described, you can provide the existing
>> >> annotation set as the input in GFF3 format, and previous models will
>> >> be maintained preferentially.
>> >
>> > You mean in a new maker run? is this possible with the old maker as
>> > well, not maker2, right?
>> >
>>
>> Yes, the original MAKER will do this.
>>
>>
>> >> If you know which ab initio predictions you want to add (I.e. the ab
>> >> initio promoting scenario I descibed), you can provide those
>> >> predictions to the use the pred_gff option and then set keep_preds=1
>> >> and they will be maintained even without evidence.  Attached is a
>> >> script that would make selecting those easier.  It take the MAKER
>> >> generated GFF3 and a list of predictions to keep (one name per
>> line).
>> >> These might be the results of a BLAST analysis for example.  It will
>> >> then return the GFF3 entries for just those models selected.
>> >
>> > The thing is, for the few cases I have looked at, I cannot really
>> > decide which model is the best, and the 3 models from the ab initio
>> > predictors do not agree on the exact intron-exon junctions or the
>> start and stop codons.
>> >>
>> >> If the situation is more complex, just provide more detail, and I am
>> >> sure we can help you come up with a plan.
>> >>
>> > What i was thinking to do was to provide a gff file of alignments (eg
>> > by
>> > exonerate) to the proteins of the closely related species that i am
>> > missing, and somehow keep the previous annotations and get the extra
>> > ones by this gff file. But how exactly maker should be run to do this
>> > I am not sure. if I want to keep the previous annotations I  need the
>> > gff file of the last maker run as input, but then how do I
>> > discriminate with the exonerate gff file? And which mode of rediction
>> > should be on, and with which parameters? You mention
>> > keep_preds=1  for the existing annotations, but how do i also promote
>> > evidence from alignments on the same way in the same run?
>> > Looks feasible though. Thanks again,
>> > Anastasia
>> >
>>
>> Let me just restate what you've said so that I can be sure that I am
>> correct about what you've already done.  You have run Maker with SNAP,
>> Genemark and Augustus using EST from a closely related species (passed
>> to altest) and protein evidence from other fungi.  You are missing
>> about 1,000 genes compared to the species that provided the EST
>> alignments.  You say their is good evidence that these genes exist from
>> the alignments and I assume by this that you mean the EST/protein
>> alignments that Maker produced.
>>
>> 1) Is the closely related fungus annotated and if so have you included
>> it's proteins in the evidence set that you provided to Maker.  If you
>> haven't provided these proteins as evidence to maker then you should do
>> this.  You can re-run maker passing your original models back through
>> like this:
>>
>> #-----Re-annotation Using MAKER Derived GFF3
>> genome_gff=original_maker_annotations.gff3
>> est_pass=1
>> altest_pass=1
>> protein_pass=1
>> rm_pass=1
>> model_pass=1
>> pred_pass=1
>> other_pass=1
>>
>> #-----Protein Homology Evidence (for best results provide a file for at
>> least one) protein=proteins_from_closely_related.fasta
>> ## OR it sounds like you've already aligned these with exonerate?
>> protein_gff=proteins_from_closely_related_already_aligned.gff
>>
>> 2) If you've already included those closely related species proteins
>> but still didn't get the 1,000 genes, then take your
>> nonoverlaping_abinits.fasta and blast them directly against your
>> closely related proteins.  Presumably they don't hit too well because
>> if they did they should have been promoted to predictions by Maker the
>> first time, but here you can decide yourself what thresholds to allow
>> to keep the abinit predictions that hit the closely related species
>> proteins.  If you filter you blast hits the way you want and keep the
>> names of the abinit predictions that pass your filter, then use the
>> script Carson attached it it will generate a abinit precidtion GFF file
>> with only the predictions you selected.  You can then pass those
>> predictions back to Maker and force it to keep them and Maker will turn
>> them from predictions
>> (match/match_part) into gene models.
>>
>> #-----Re-annotation Using MAKER Derived GFF3
>> genome_gff=original_maker_annotations.gff3
>> est_pass=1
>> altest_pass=1
>> protein_pass=1
>> rm_pass=1
>> model_pass=1
>> pred_pass=0
>> other_pass=1
>>
>> #-----Gene Prediction
>> snaphmm=
>> gmhmm=
>> augustus_species=
>> fgenesh_par_file=
>> pred_gff=ab_init_predictions_rescued_by_blast.gff
>>
>> keep_preds=1
>>
>> Barry
>>
>> >> Thanks,
>> >> Carson
>> >>
>> >> From:  Anastasia Gioti <[hidden email]>
>> >> Date:  Wed, 25 Apr 2012 11:09:36 +0200
>> >> To:  <[hidden email]>
>> >> Subject:  [maker-devel] Use pass-through system to add missing genes
>> >>
>> >> Hi,
>> >> I  have a set of predicted proteins from the genome of a fungus
>> >> annotated by MAKER  using EST data from a closely related species
>> and
>> >> 3 ab initio predictors  (snap iterativelly trained 3 times, genemark
>> >> trained directly on the assembly and augustus with a model from a
>> >> less closely related species), along with a set of fungal proteins.
>> I
>> >> am missing ~ 1000 proteins when I compare to the species i used EST
>> >> data from, and there is good evidence from alignments that these
>> >> genes exist. The question is how to proceed from Blast hits to
>> actual
>> >> gene models here. The idea would be to add these genes to the
>> >> existing dataset, rather than reannotate the genome. I believe that
>> >> reannotating it without any further evidence such as RNA-seq from
>> the
>> >> species itself would not change much,and i d rather stick with
>> actual
>> >> predictions that i trust and have used in subsequent analyses. The
>> >> 1000 genes I can accept to annotate with a less stringent and
>> reliable way than MAKER, I just want to add them so that the difference
>> in gene count gets corrected.
>> >> I was reading the MAKER 2 paper and i was wondering if I can use the
>> >> legacy annotations scheme to do it, by providing GFF3 of the
>> >> alignments between the two species in the regions where genes were
>> >> missed, but as i said, I would not like to reannotate the whole
>> >> genome, and running MAKER2 might cause slight changes that i d like
>> >> to avoid. Is this possible? First, is it possible to provide a Gff3
>> >> file of specific locations and not the entire genome alignment? (I
>> >> guess so..) Second, how can I tag the existing annotations as 'not
>> to be changed' or alternatively, tag the new models only?
>> >> How should I run maker2, with which predictors on and which off?
>> >> Thanks,
>> >> Anastasia
>> >>
>> >> Anastasia Gioti
>> >> Post-doctoral Researcher
>> >>
>> >> [hidden email]
>> >> [hidden email]
>> >>
>> >>
>> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia
>> >> /
>> >>
>> >>
>> >>
>> >> _______________________________________________ maker-devel mailing
>> >> list
>> >> maker-
>> [hidden email]://box290.bluehost.com/mailman/lis
>> >> tinfo/ma
>> >> ker-devel_yandell-lab.org
>> >> <gff3_select>
>> >
>> > Anastasia Gioti
>> > Post-doctoral Researcher
>> >
>> > [hidden email]
>> > [hidden email]
>> >
>> >
>> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/
>> >
>> >
>> >
>> > _______________________________________________
>> > maker-devel mailing list
>> > [hidden email]
>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-
>> lab.or
>> > g
>>
>> Barry Moore
>> Research Scientist
>> Dept. of Human Genetics
>> University of Utah
>> Salt Lake City, UT 84112
>> --------------------------------------------
>> (801) 585-3543
>>
>>
>>
>>
>>
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-
>> lab.org/attachments/20120427/72b70d49/attachment.html>
>>
>> ------------------------------
>>
>> _______________________________________________
>> maker-devel mailing list
>> [hidden email]
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>> End of maker-devel Digest, Vol 47, Issue 14
>> *******************************************
>
>_______________________________________________
>maker-devel mailing list
>[hidden email]
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

gff3_select (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use pass-through system to add missing genes

Anastasia Gioti-2
In reply to this post by Barry Moore
Hi Barry,
Thanks for your detailed instructions. You well understood that I have  
already included the proteins of the closely related species in my  
protein evidence dataset, but still did not get the genes. I have now  
blasted (P) the missing 949 proteins from this species against my  
nonoverlaping_abinits.fasta proteins and have found 618 good hits,  
which i guess I can promote to models using the routine no 2 of your  
last email and Carson's script gff3_select.
I have also looked at the rest of the proteins (331) for which there  
was no model in the nonoverlaping_abinits.fasta. I will try to  
describe 2 examples I looked at in apollo:

1) ab initio models predicted a ~7.5 kb gene covering 3 genes (as  
predicted in the closely related species). Blastx+protein2genome  
similarities were reported for two of these genes, but not for the 3rd  
(the one in the middle). MAKER finally decided to call two genes,  
respecting the blastx+protein2genome evidence, but the 3rd was lost.
I have previously  reported here that MAKER tends to  fuse genes in  
multi-exonic genes and others reported that too, I remember you  
proposed changing a papameter to alter this. To keep in mind for my  
final strategy that i am trying to decide on (for the moment i have  
not rerun MAKER).
For this case, abinitio models do not exist for the gene (in the sense  
that the existing models overlap many genes) and the similarity to the  
protein of the closely related species was not judged sufficient,  
although when i look at a TblastN alignment for this area it looks  
fine to me.

2) Only the 3' end of the gene was called by MAKER, despite blastx  
+protein2genome evidence from the closely related species for the  
entire region. Abinitio models existed as 2 separate genes , one for  
the 3' end region (finally retained by MAKER in a consensus decision I  
guess) and one for the 5' region, but here not all predictors called  
an orf, and finally nothing was called in this region.
In this case, it is a misannotation rather, but which misses a very  
important part of the gene.
I hope my descriptions are clear, otherwise I can provide you the gff  
file of these 2 examples to look by yourself.

I am not very clear about what to do about these 331 cases (which I do  
not know how to look at as well, except for random examples' viwing in  
Apollo). I feel that a second MAKER run would be probably the  
solution, this time providing as pred_gff  the result of a blast  
against the 331. But still, the existing annotations would then have  
to be somehow updated as the new predictions are in conflict with them  
(see example 2). I am a bit confused.
to recap, what would you suggest for the 331 still-missing proteins in  
terms of asessing their profiles n a rather automatic way and in  
inluding them in my annotations without going deep into manual gene  
curation?
Many thnks,
Anastasia

>
> Let me just restate what you've said so that I can be sure that I am  
> correct about what you've already done.  You have run Maker with  
> SNAP, Genemark and Augustus using EST from a closely related species  
> (passed to altest) and protein evidence from other fungi.  You are  
> missing about 1,000 genes compared to the species that provided the  
> EST alignments.  You say their is good evidence that these genes  
> exist from the alignments and I assume by this that you mean the EST/
> protein alignments that Maker produced.
>
> 1) Is the closely related fungus annotated and if so have you  
> included it's proteins in the evidence set that you provided to  
> Maker.  If you haven't provided these proteins as evidence to maker  
> then you should do this.  You can re-run maker passing your original  
> models back through like this:
>
> #-----Re-annotation Using MAKER Derived GFF3
> genome_gff=original_maker_annotations.gff3
> est_pass=1
> altest_pass=1
> protein_pass=1
> rm_pass=1
> model_pass=1
> pred_pass=1
> other_pass=1
>
> #-----Protein Homology Evidence (for best results provide a file for  
> at least one)
> protein=proteins_from_closely_related.fasta
> ## OR it sounds like you've already aligned these with exonerate?
> protein_gff=proteins_from_closely_related_already_aligned.gff
>
> 2) If you've already included those closely related species proteins  
> but still didn't get the 1,000 genes, then take your  
> nonoverlaping_abinits.fasta and blast them directly against your  
> closely related proteins.  Presumably they don't hit too well  
> because if they did they should have been promoted to predictions by  
> Maker the first time, but here you can decide yourself what  
> thresholds to allow to keep the abinit predictions that hit the  
> closely related species proteins.  If you filter you blast hits the  
> way you want and keep the names of the abinit predictions that pass  
> your filter, then use the script Carson attached it it will generate  
> a abinit precidtion GFF file with only the predictions you  
> selected.  You can then pass those predictions back to Maker and  
> force it to keep them and Maker will turn them from predictions  
> (match/match_part) into gene models.
>
> #-----Re-annotation Using MAKER Derived GFF3
> genome_gff=original_maker_annotations.gff3
> est_pass=1
> altest_pass=1
> protein_pass=1
> rm_pass=1
> model_pass=1
> pred_pass=0
> other_pass=1
>
> #-----Gene Prediction
> snaphmm=
> gmhmm=
> augustus_species=
> fgenesh_par_file=
> pred_gff=ab_init_predictions_rescued_by_blast.gff
>
> keep_preds=1
>
> Barry
>
>>> Thanks,
>>> Carson
>>>
>>> From: Anastasia Gioti <[hidden email]>
>>> Date: Wed, 25 Apr 2012 11:09:36 +0200
>>> To: <[hidden email]>
>>> Subject: [maker-devel] Use pass-through system to add missing genes
>>>
>>> Hi,
>>> I  have a set of predicted proteins from the genome of a fungus  
>>> annotated by MAKER  using EST data from a closely related species  
>>> and 3 ab initio predictors  (snap iterativelly trained 3 times,  
>>> genemark trained directly on the assembly and augustus with a  
>>> model from a less closely related species), along with a set of  
>>> fungal proteins. I am missing ~ 1000 proteins when I compare to  
>>> the species i used EST data from, and there is good evidence from  
>>> alignments that these genes exist. The question is how to proceed  
>>> from Blast hits to actual gene models here. The idea would be to  
>>> add these genes to the existing dataset, rather than reannotate  
>>> the genome. I believe that reannotating it without any further  
>>> evidence such as RNA-seq from the species itself would not change  
>>> much,and i d rather stick with actual predictions that i trust and  
>>> have used in subsequent analyses. The 1000 genes I can accept to  
>>> annotate with a less stringent and reliable way than MAKER, I just  
>>> want to add them so that the difference in gene count gets  
>>> corrected.
>>> I was reading the MAKER 2 paper and i was wondering if I can use  
>>> the legacy annotations scheme to do it, by providing GFF3 of the  
>>> alignments between the two species in the regions where genes were  
>>> missed, but as i said, I would not like to reannotate the whole  
>>> genome, and running MAKER2 might cause slight changes that i d  
>>> like to avoid. Is this possible? First, is it possible to provide  
>>> a Gff3 file of specific locations and not the entire genome  
>>> alignment? (I guess so..) Second, how can I tag the existing  
>>> annotations as 'not to be changed' or alternatively, tag the new  
>>> models only? How should I run maker2, with which predictors on and  
>>> which off?
>>> Thanks,
>>> Anastasia
>>>
>>> Anastasia Gioti
>>> Post-doctoral Researcher
>>>
>>> [hidden email]
>>> [hidden email]
>>>
>>> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/
>>>
>>>
>>>
>>> _______________________________________________ maker-devel  
>>> mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> <gff3_select>
>>
>> Anastasia Gioti
>> Post-doctoral Researcher
>>
>> [hidden email]
>> [hidden email]
>>
>> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/ 
>> Gioti_Anastasia/
>>
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> [hidden email]
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- 
> lab.org

Anastasia (Natassa) Gioti
Post-Doc Researcher
Evolutionary Biology Department Uppsala University -Science for Life  
lab, Karolinska Institute Stockholm
[hidden email]
[hidden email]

http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/








_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Use pass-through system to add missing genes

Anastasia Gioti-2
In reply to this post by Barry Moore
Hi again,
I hav sent an email a few days ago about this thread, and i am not  
sure if you have received it or you still did not have time to look at  
it. In any case, this email was dealing with the fact that some  
proteins were not retrieved in the abinitio models and how to deal  
with it. What I would like to ask here is a few confirmations on how  
to rerun maker for the proteins that were retrieved in the abinitio  
models. i have looked at the Blast results, and have done a series of  
check-ups, so now I am ready to run MAKER again with a list of models  
that I want to retain.
Regarding the following parameters:

1. Do I  set the genome= to nothing here? i.e quote it out? This is in  
the beginning of the control file
#-----Genome (Required for De-Novo Annotation)
genome=#genome sequence file in fasta format
organism_type= #eukaryotic or prokaryotic. Default is eukaryotic

>
> #-----Re-annotation Using MAKER Derived GFF3
> genome_gff=original_maker_annotations.gff3
> est_pass=1
> altest_pass=1
> protein_pass=1
> rm_pass=1
> model_pass=1
> pred_pass=0
> other_pass=1
>
> #-----Gene Prediction
2. Do i provide again the snap etc models? I am not sure, because i  
thought MAKER would not run ab initio predictors this time (this is  
why I would also quote out the genome file above, as this is not a de  
novo annotation). but if it will, i will then provide the previous  
models i used, except for snap, for which I will generate a new model  
from the gff3 file of the last run (according to snap documentation).  
Am i correct?
> snaphmm=
> gmhmm=
> augustus_species=
> fgenesh_par_file=
> pred_gff=ab_init_predictions_rescued_by_blast.gff
>
> keep_preds=1

Samely, what do i do with repeatmasking etc?
Thanks in adavance,
Anastasia

>
> Barry
>
>>> Thanks,
>>> Carson
>>>
>>> From: Anastasia Gioti <[hidden email]>
>>> Date: Wed, 25 Apr 2012 11:09:36 +0200
>>> To: <[hidden email]>
>>> Subject: [maker-devel] Use pass-through system to add missing genes
>>>
>>> Hi,
>>> I  have a set of predicted proteins from the genome of a fungus  
>>> annotated by MAKER  using EST data from a closely related species  
>>> and 3 ab initio predictors  (snap iterativelly trained 3 times,  
>>> genemark trained directly on the assembly and augustus with a  
>>> model from a less closely related species), along with a set of  
>>> fungal proteins. I am missing ~ 1000 proteins when I compare to  
>>> the species i used EST data from, and there is good evidence from  
>>> alignments that these genes exist. The question is how to proceed  
>>> from Blast hits to actual gene models here. The idea would be to  
>>> add these genes to the existing dataset, rather than reannotate  
>>> the genome. I believe that reannotating it without any further  
>>> evidence such as RNA-seq from the species itself would not change  
>>> much,and i d rather stick with actual predictions that i trust and  
>>> have used in subsequent analyses. The 1000 genes I can accept to  
>>> annotate with a less stringent and reliable way than MAKER, I just  
>>> want to add them so that the difference in gene count gets  
>>> corrected.
>>> I was reading the MAKER 2 paper and i was wondering if I can use  
>>> the legacy annotations scheme to do it, by providing GFF3 of the  
>>> alignments between the two species in the regions where genes were  
>>> missed, but as i said, I would not like to reannotate the whole  
>>> genome, and running MAKER2 might cause slight changes that i d  
>>> like to avoid. Is this possible? First, is it possible to provide  
>>> a Gff3 file of specific locations and not the entire genome  
>>> alignment? (I guess so..) Second, how can I tag the existing  
>>> annotations as 'not to be changed' or alternatively, tag the new  
>>> models only? How should I run maker2, with which predictors on and  
>>> which off?
>>> Thanks,
>>> Anastasia
>>>
>>> Anastasia Gioti
>>> Post-doctoral Researcher
>>>
>>> [hidden email]
>>> [hidden email]
>>>
>>> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/
>>>
>>>
>>>
>>> _______________________________________________ maker-devel  
>>> mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> <gff3_select>
>>
>> Anastasia Gioti
>> Post-doctoral Researcher
>>
>> [hidden email]
>> [hidden email]
>>
>> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/ 
>> Gioti_Anastasia/
>>
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> [hidden email]
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- 
> lab.org

Anastasia (Natassa) Gioti
Post-Doc Researcher
Evolutionary Biology Department Uppsala University -Science for Life  
lab, Karolinska Institute Stockholm
[hidden email]
[hidden email]

http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/








_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Use pass-through system to add missing genes

Anastasia Gioti-2
In reply to this post by Barry Moore
Hi 
and sorry for the multiple postings. I have a list of models rescued by the nonoverlaping_abinits.fasta  fles (against which i  blasted my missing proteins from the closely related species and further filtered out the dubious hits) and a maker gff3 file, but Carson's script gff3_select won't work, and the reason is that these abinitio models were not promoted into the maker gff3 file, thus they are not there. I refer to the gff3 file generated by gff3_merge script. Am i missing something?
Thank you, 
Anastasia




If you know which ab initio predictions you want to add (I.e. the ab initio promoting scenario I descibed), you can provide those predictions to the use the pred_gff option and then set keep_preds=1 and they will be maintained even without evidence.  Attached is a script that would make selecting those easier.  It take the MAKER generated GFF3 and a list of predictions to keep (one name per line).  These might be the results of a BLAST analysis for example.  It will then return the GFF3 entries for just those models selected.


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org