Model training with AED=0.7 made all contigs FAILED

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Model training with AED=0.7 made all contigs FAILED

Lahcen Campbell-2

Hi folks,

I would just like some insight into a recent round of MAKER annotation I performed and returned back 0 Finished contigs.

The genome is a white fly, which I successfully ran MAKER on initally with the first round of "Evidence in", so passing in EST evidence as aligned transcript gffs, protein homology evidence etc. The run was successful and produced a lot of good quality gene models

Statistics:
           24,613 genes with 49,547 transcripts containing 141130 cds.

Now, I know this count is very high for our species, so in the 2nd round (completed running over 1 night due to all contigs failing) I attempted to increase the threshold for support, by reducing AED to 0.7 from an initial 1. Prior to starting the second round I had trained SNAP on the first round results and also ran Augustus separately and  passed this via the snaphmm, pred_gff option. Finally I set min protein to be no less than 100Aa and set est2genome and prot2genome off to allow for gene model refinement.

I checked the run today and all ~8,000 contigs/scaffolds returned as FAILED with all having tried to be retried once each.

My initial feeling was, I feared I have just lost my initial set of 24,613 gene models. I know believe that this won't be the case but Im not sure... Can anyone explain what might have happened here and what consequences will follow given they all returned as failed ? Have they been deleted from the MAKER data store ?

I had capturdD all 1st round MAKER output files (GFF, Fasta files etc) before attempting this 2nd round (i.e. 1st round of model training) of MAKER .

If I have irrevocably changed the datastore for MAKER and lost those genes, might I be able to restore to an earlier point (say back to the first round of evidence in gene models) by passing the first MAKER gff in as "maker_gff=" / "pred_pass=1" / "model_pass=1" ?

Any advice on this would be much appreciated
Lahcen









_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Model training with AED=0.7 made all contigs FAILED

Carson Holt-2
There is probably an issue with the GFF3 file being passed in (I’m guessing the Augustus one). I would avoid passing in Augustus results as GFF3, it removes the ability of MAKER to dynamically provide Augustus with hints as it runs. You are essentially handicapping the pipeline.

If your first genes were est2genome or protein2genome based, I would not pass them back in. Those models are suitable for training but will really reduce the accuracy of downstream final annotations (that is why we tell people to turn off est2genome/protein2genome after training a gene predictor in the MAKER documentation). Also if your inputs to the first round were GFF3 files it will have to be reread regardless. Any protein or transcript data that was aligned by MAEKR will still have the BLAST results archived, so you don’t need to worry about that unless you alter repeat masking options (which would cause it to rerun). Also if you are changing GFF3 file input between runs but using the same directory, you might want to delete any “.db” files in the output folder. those hold an SQLite database of the GFF3 input that may be corrupted if it failed while attempting to update the database content with the Augustus gff3 file.

—Carson


On Nov 9, 2017, at 4:13 AM, Lahcen Campbell <[hidden email]> wrote:

Hi folks,

I would just like some insight into a recent round of MAKER annotation I performed and returned back 0 Finished contigs.

The genome is a white fly, which I successfully ran MAKER on initally with the first round of "Evidence in", so passing in EST evidence as aligned transcript gffs, protein homology evidence etc. The run was successful and produced a lot of good quality gene models


Statistics:
           24,613 genes with 49,547 transcripts containing 141130 cds.

Now, I know this count is very high for our species, so in the 2nd round (completed running over 1 night due to all contigs failing) I attempted to increase the threshold for support, by reducing AED to 0.7 from an initial 1. Prior to starting the second round I had trained SNAP on the first round results and also ran Augustus separately and  passed this via the snaphmm, pred_gff option. Finally I set min protein to be no less than 100Aa and set est2genome and prot2genome off to allow for gene model refinement.

I checked the run today and all ~8,000 contigs/scaffolds returned as FAILED with all having tried to be retried once each.

My initial feeling was, I feared I have just lost my initial set of 24,613 gene models. I know believe that this won't be the case but Im not sure... Can anyone explain what might have happened here and what consequences will follow given they all returned as failed ? Have they been deleted from the MAKER data store ?

I had capturdD all 1st round MAKER output files (GFF, Fasta files etc) before attempting this 2nd round (i.e. 1st round of model training) of MAKER .

If I have irrevocably changed the datastore for MAKER and lost those genes, might I be able to restore to an earlier point (say back to the first round of evidence in gene models) by passing the first MAKER gff in as "maker_gff=" / "pred_pass=1" / "model_pass=1" ?

Any advice on this would be much appreciated
Lahcen








_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org