which files are expected after fasta_merge?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

which files are expected after fasta_merge?

Brandon Pickett
Good afternoon!

I just finished my third round of maker. I trained snap, augustus, etc. between the rounds. I used fasta_merge and gff3_merge to extract files after each round of maker. gff3_merge performed as expected each time, but fasta_merge surprised me. I will show you which files fasta_merge generated after each round. Please note that, as many people do, I renamed my output files from the default. Accordingly, I will list all the files with a generalized prefix of "maker" and show the rest of the file name as it was generated for me. Also note that I've changed .fasta to .fa for brevity.

After round #1:
transcripts.fa
proteins.fa

After round #2:
non_overlapping_ab_initio.proteins.fa
non_overlapping_ab_initio.transcripts.fa
transcripts.fa
augustus_masked.proteins.fa
augustus_masked.transcripts.fa
evm.proteins.fa
evm.transcripts.fa
genemark.proteins.fa
genemark.transcripts.fa
snap_masked.proteins.fa
snap_masked.transcripts.fa
proteins.fa

After round #3:
non_overlapping_ab_initio.proteins.fa
non_overlapping_ab_initio.transcripts.fa
augustus_masked.proteins.fa
augustus_masked.transcripts.fa
genemark.proteins.fa
genemark.transcripts.fa
snap_masked.proteins.fa
snap_masked.transcripts.fa

I am unsurprised that I didn't get all these files after round #1 because I used round #1 to generate gene models from transcript evidence. I didn't expect so many files after round #2 (having only seen the output from round #1 up to that point), but it makes sense that I would get output from augustus, evidence modeler (evm), genemark, and snap since I provided them as input to this round (#2) of maker. Between rounds #2 and #3, I re-trained snap and augustus. Genemark was trained between rounds #1 and #2 without gene models from maker and thus did not require re-training. The only difference in my maker control files between rounds #2 and #3 were the paths to the snap and augustus files. In both #2 and #3, the control files had run_evm=1. I can provide my control files for each round, if needed. My question is why transcripts.fa, proteins.fa, evm.proteins.fa, and evm.transcripts.fa were not generated after round #3? I recognize that this is probably not an error, rather a lack of my understanding of when each file is and is not generated.

Thank you,
Brandon Pickett


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: which files are expected after fasta_merge?

Carson Holt-2
If you disabled evidence for round 3 (i.e. protein= and est=) then you will get no annotations and EVM will not run. You can look at the GFF3 in a browser, and if you see that there are no protein/est alignments, then that is likely why.

—Carson



On Aug 15, 2019, at 2:48 PM, Brandon Pickett <[hidden email]> wrote:

Good afternoon!

I just finished my third round of maker. I trained snap, augustus, etc. between the rounds. I used fasta_merge and gff3_merge to extract files after each round of maker. gff3_merge performed as expected each time, but fasta_merge surprised me. I will show you which files fasta_merge generated after each round. Please note that, as many people do, I renamed my output files from the default. Accordingly, I will list all the files with a generalized prefix of "maker" and show the rest of the file name as it was generated for me. Also note that I've changed .fasta to .fa for brevity.

After round #1:
transcripts.fa
proteins.fa

After round #2:
non_overlapping_ab_initio.proteins.fa
non_overlapping_ab_initio.transcripts.fa
transcripts.fa
augustus_masked.proteins.fa
augustus_masked.transcripts.fa
evm.proteins.fa
evm.transcripts.fa
genemark.proteins.fa
genemark.transcripts.fa
snap_masked.proteins.fa
snap_masked.transcripts.fa
proteins.fa

After round #3:
non_overlapping_ab_initio.proteins.fa
non_overlapping_ab_initio.transcripts.fa
augustus_masked.proteins.fa
augustus_masked.transcripts.fa
genemark.proteins.fa
genemark.transcripts.fa
snap_masked.proteins.fa
snap_masked.transcripts.fa

I am unsurprised that I didn't get all these files after round #1 because I used round #1 to generate gene models from transcript evidence. I didn't expect so many files after round #2 (having only seen the output from round #1 up to that point), but it makes sense that I would get output from augustus, evidence modeler (evm), genemark, and snap since I provided them as input to this round (#2) of maker. Between rounds #2 and #3, I re-trained snap and augustus. Genemark was trained between rounds #1 and #2 without gene models from maker and thus did not require re-training. The only difference in my maker control files between rounds #2 and #3 were the paths to the snap and augustus files. In both #2 and #3, the control files had run_evm=1. I can provide my control files for each round, if needed. My question is why transcripts.fa, proteins.fa, evm.proteins.fa, and evm.transcripts.fa were not generated after round #3? I recognize that this is probably not an error, rather a lack of my understanding of when each file is and is not generated.

Thank you,
Brandon Pickett

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org