Unwarranted error: Skipping the contig because it is too short

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Unwarranted error: Skipping the contig because it is too short

lahcen campbell
Hi MAKER community,

I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3

Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. 

Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. 

I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. 

I have only edited the maker_opts file and changed:

genome=
protein=
protein2genome=1

But see attached my maker CTL files. 

The error consistently returned to me:

Skipping the contig because it is too short!!
SeqID: contig_WHATEVER
Length: 0

The sequences are no where near too short. This was verified independently outside maker to be sure. 

The headers are as follows:

>tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no

I have just about given up, I have no idea why its happening it makes zero sense. 

Any help or information as to why this might be happening would be amazing. 

Thank you in advance. 
Lahcen

--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

maker_bopts.ctl (1K) Download Attachment
maker_exe.ctl (2K) Download Attachment
maker_opts.ctl (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unwarranted error: Skipping the contig because it is too short

Michael Campbell
Hi Lahcen,

Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it.

Thanks,
Mike
On Nov 14, 2017, at 7:15 AM, lahcen campbell <[hidden email]> wrote:

Hi MAKER community,

I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3

Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. 

Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. 

I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. 

I have only edited the maker_opts file and changed:

genome=
protein=
protein2genome=1

But see attached my maker CTL files. 

The error consistently returned to me:

Skipping the contig because it is too short!!
SeqID: contig_WHATEVER
Length: 0

The sequences are no where near too short. This was verified independently outside maker to be sure. 

The headers are as follows:

>tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no

I have just about given up, I have no idea why its happening it makes zero sense. 

Any help or information as to why this might be happening would be amazing. 

Thank you in advance. 
Lahcen

--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================
<maker_bopts.ctl><maker_exe.ctl><maker_opts.ctl>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Unwarranted error: Skipping the contig because it is too short

Michael Campbell
Hi Lancen,

Thanks, the name has served me well for a number of years now :)

So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one.

To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi.

One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). 

As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to  many times on failed contigs you can increase the number of tries in the opts file. The line looks like this:

tries=2 #number of times to try a contig if there is a failure for some reason

If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line:

rm_gff= #pre-identified repeat elements from an external GFF3 file

you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line:

protein_gff=  #aligned protein homology evidence from an external GFF3 file

Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input.

Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. 

I ccd the list on this response just Incas anyone else has any ideas or is facing the same error.

Let me know if any of this helps,
Mike

On Nov 14, 2017, at 10:48 AM, lahcen campbell <[hidden email]> wrote:

Hi Michael 

Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. 

I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. 

Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage.  


Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... 

I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly.

MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. 

Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? 

Your help is very much appreciated Michael so thank you, 
Best
L

 Combined_Protein_homology.fa.zip<img src="x-msg://ssl.gstatic.com/ui/v1/icons/common/x_8px.png" style="opacity: 0.55; cursor: pointer; float: right; top: -1px; display: none;" class="">
​​
 SubsampledGenomeFile_n10_11MB.fasta<img src="x-msg://ssl.gstatic.com/ui/v1/icons/common/x_8px.png" style="opacity: 0.55; cursor: pointer; float: right; top: -1px; display: none;" class="">




On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell <[hidden email]> wrote:
Hi Lahcen,

Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it.

Thanks,
Mike
On Nov 14, 2017, at 7:15 AM, lahcen campbell <[hidden email]> wrote:

Hi MAKER community,

I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3

Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. 

Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. 

I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. 

I have only edited the maker_opts file and changed:

genome=
protein=
protein2genome=1

But see attached my maker CTL files. 

The error consistently returned to me:

Skipping the contig because it is too short!!
SeqID: contig_WHATEVER
Length: 0

The sequences are no where near too short. This was verified independently outside maker to be sure. 

The headers are as follows:

>tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no

I have just about given up, I have no idea why its happening it makes zero sense. 

Any help or information as to why this might be happening would be amazing. 

Thank you in advance. 
Lahcen

--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================
<maker_bopts.ctl><maker_exe.ctl><maker_opts.ctl>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org




--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Unwarranted error: Skipping the contig because it is too short

Carson Holt-2
In reply to this post by lahcen campbell
My first thought is that one of your entries has a header and no sequence. Try this command with the fasta you are using —>

fasta_tool file.fasta --length | sort -nrk2

fasta_tool comes with maker. That command will report empty fasta entries at the bottom of the list with length 0.

Alternatively, MAKER accesses the input assembly using BioPerl. Update your BioPerl to the latest CPAN version (do not use BioPerl-live, as it will be less stable). Also BioPerl is using BerkleyDB for indexing, so if you are using a Perl that is not the system Perl (i.e. /usr/bin/perl), then it was lik,ly compiled on the machine you are using and could have been compiled without BerkleyDB support.

—Carson



On Nov 14, 2017, at 5:15 AM, lahcen campbell <[hidden email]> wrote:

Hi MAKER community,

I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3

Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. 

Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. 

I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. 

I have only edited the maker_opts file and changed:

genome=
protein=
protein2genome=1

But see attached my maker CTL files. 

The error consistently returned to me:

Skipping the contig because it is too short!!
SeqID: contig_WHATEVER
Length: 0

The sequences are no where near too short. This was verified independently outside maker to be sure. 

The headers are as follows:

>tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no

I have just about given up, I have no idea why its happening it makes zero sense. 

Any help or information as to why this might be happening would be amazing. 

Thank you in advance. 
Lahcen

--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================
<maker_bopts.ctl><maker_exe.ctl><maker_opts.ctl>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Unwarranted error: Skipping the contig because it is too short

lahcen campbell
In reply to this post by Michael Campbell
Hi Michael and Carson

Thank you both for your helpful input, I really appreciate it. 

See below for my comments...

Best
Lahcen


On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell <[hidden email]> wrote:
Hi Lancen,

Thanks, the name has served me well for a number of years now :)

Its a good name, I wouldn't change it haha :) 
 

So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one.

To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi.

One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). 

First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! 


As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to  many times on failed contigs you can increase the number of tries in the opts file. The line looks like this:

tries=2 #number of times to try a contig if there is a failure for some reason

If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line:

Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. 

Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users.  
 

rm_gff= #pre-identified repeat elements from an external GFF3 file

I will remove links to fasta files for both 'rmlib=' and 'repeat_protein='
 

you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line:

protein_gff=  #aligned protein homology evidence from an external GFF3 file

Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input.

You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state.  

I noticed that the only unique source field types in my MAKER GFF are as follows: 
augustus_masked 
blastx
maker
protein2genome
repeatmasker
repeatrunner

I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? 

Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. 

#-----Re-annotation Using MAKER Derived GFF3
.....
est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no 

I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. 

Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? 

Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! 

L

 
Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. 

I ccd the list on this response just Incas anyone else has any ideas or is facing the same error.

Let me know if any of this helps,
Mike

On Nov 14, 2017, at 10:48 AM, lahcen campbell <[hidden email]> wrote:

Hi Michael 

Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. 

I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. 

Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage.  


Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... 

I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly.

MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. 

Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? 

Your help is very much appreciated Michael so thank you, 
Best
L




On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell <[hidden email]> wrote:
Hi Lahcen,

Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it.

Thanks,
Mike
On Nov 14, 2017, at 7:15 AM, lahcen campbell <[hidden email]> wrote:

Hi MAKER community,

I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3

Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. 

Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. 

I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. 

I have only edited the maker_opts file and changed:

genome=
protein=
protein2genome=1

But see attached my maker CTL files. 

The error consistently returned to me:

Skipping the contig because it is too short!!
SeqID: contig_WHATEVER
Length: 0

The sequences are no where near too short. This was verified independently outside maker to be sure. 

The headers are as follows:

>tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no

I have just about given up, I have no idea why its happening it makes zero sense. 

Any help or information as to why this might be happening would be amazing. 

Thank you in advance. 
Lahcen

--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================
<maker_bopts.ctl><maker_exe.ctl><maker_opts.ctl>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org




--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================




--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Unwarranted error: Skipping the contig because it is too short

lahcen campbell
In reply to this post by Michael Campbell
Just an add on to this topic.... I have found a suite of gff utilities here which I hope can help me quickly parse the MAKER gff.


I'll report back how it goes !

Best
L

On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell <[hidden email]> wrote:
Hi Lancen,

Thanks, the name has served me well for a number of years now :)

So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one.

To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi.

One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). 

As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to  many times on failed contigs you can increase the number of tries in the opts file. The line looks like this:

tries=2 #number of times to try a contig if there is a failure for some reason

If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line:

rm_gff= #pre-identified repeat elements from an external GFF3 file

you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line:

protein_gff=  #aligned protein homology evidence from an external GFF3 file

Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input.

Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. 

I ccd the list on this response just Incas anyone else has any ideas or is facing the same error.

Let me know if any of this helps,
Mike

On Nov 14, 2017, at 10:48 AM, lahcen campbell <[hidden email]> wrote:

Hi Michael 

Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. 

I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. 

Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage.  


Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... 

I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly.

MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. 

Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? 

Your help is very much appreciated Michael so thank you, 
Best
L




On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell <[hidden email]> wrote:
Hi Lahcen,

Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it.

Thanks,
Mike
On Nov 14, 2017, at 7:15 AM, lahcen campbell <[hidden email]> wrote:

Hi MAKER community,

I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3

Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. 

Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. 

I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. 

I have only edited the maker_opts file and changed:

genome=
protein=
protein2genome=1

But see attached my maker CTL files. 

The error consistently returned to me:

Skipping the contig because it is too short!!
SeqID: contig_WHATEVER
Length: 0

The sequences are no where near too short. This was verified independently outside maker to be sure. 

The headers are as follows:

>tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no

I have just about given up, I have no idea why its happening it makes zero sense. 

Any help or information as to why this might be happening would be amazing. 

Thank you in advance. 
Lahcen

--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================
<maker_bopts.ctl><maker_exe.ctl><maker_opts.ctl>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org




--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================




--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Unwarranted error: Skipping the contig because it is too short

Michael Campbell
In reply to this post by lahcen campbell
Hi Lahcen,

I put some answers below.
On Nov 15, 2017, at 11:32 AM, lahcen campbell <[hidden email]> wrote:

Hi Michael and Carson

Thank you both for your helpful input, I really appreciate it. 

See below for my comments...

Best
Lahcen


On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell <[hidden email]> wrote:
Hi Lancen,

Thanks, the name has served me well for a number of years now :)

Its a good name, I wouldn't change it haha :) 
 

So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one.

To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi.

One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). 

First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! 
Great

As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to  many times on failed contigs you can increase the number of tries in the opts file. The line looks like this:

tries=2 #number of times to try a contig if there is a failure for some reason

If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line:

Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. 

Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users.  
 I've got a script that will do it if you want it. Since you don't need all of the entries grep is probably as easy as anyting. grep -P '\tsource\t'

rm_gff= #pre-identified repeat elements from an external GFF3 file

I will remove links to fasta files for both 'rmlib=' and 'repeat_protein='
Yep

you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line:

protein_gff=  #aligned protein homology evidence from an external GFF3 file

Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input.

You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state.  
Right. blast will not be called as long as you remove or comment out the paths to the fastas in the est= and protein= lines.

I noticed that the only unique source field types in my MAKER GFF are as follows: 
augustus_masked 
blastx
maker
protein2genome
repeatmasker
repeatrunner
That look right for the run you described
I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? 

Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. 

#-----Re-annotation Using MAKER Derived GFF3
.....
est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no 

You don't need model_pass or pred_pass if you plan on running gene finders
I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. 
Yeah, when I have issues with failing contigs I'll pull stuff out until it starts running without error, then I add things back until something breaks.

Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? 
0 those options are just for generating gene models directly from evidence when you don't have any gene finders trained. When you say updating do you mean reusing evidence from previous runs and generating new gene annotations or are you taking existing gene models and adding new evidence to see if they can be improved?

Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! 
No Worries, hope it helps.

L

 
Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. 

I ccd the list on this response just Incas anyone else has any ideas or is facing the same error.

Let me know if any of this helps,
Mike

On Nov 14, 2017, at 10:48 AM, lahcen campbell <[hidden email]> wrote:

Hi Michael 

Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. 

I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. 

Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage.  


Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... 

I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly.

MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. 

Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? 

Your help is very much appreciated Michael so thank you, 
Best
L




On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell <[hidden email]> wrote:
Hi Lahcen,

Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it.

Thanks,
Mike
On Nov 14, 2017, at 7:15 AM, lahcen campbell <[hidden email]> wrote:

Hi MAKER community,

I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3

Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. 

Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. 

I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. 

I have only edited the maker_opts file and changed:

genome=
protein=
protein2genome=1

But see attached my maker CTL files. 

The error consistently returned to me:

Skipping the contig because it is too short!!
SeqID: contig_WHATEVER
Length: 0

The sequences are no where near too short. This was verified independently outside maker to be sure. 

The headers are as follows:

>tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no

I have just about given up, I have no idea why its happening it makes zero sense. 

Any help or information as to why this might be happening would be amazing. 

Thank you in advance. 
Lahcen

--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================
<maker_bopts.ctl><maker_exe.ctl><maker_opts.ctl>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org




--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================




--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Unwarranted error: Skipping the contig because it is too short

Carson Holt-2
Just one note I want to add here. When you use GFF3 to pass in results as opposed to letting MAKER use the raw alignments, you lose the ability of MAKER to base some decisions on reading frame match since you lose both the alignment sequence and cigar string of the alignment. So MAKER just assumes correct ORF and sequence match rather than evaluating it (this will make AED scores artificially better for some models).

—Carson



On Nov 15, 2017, at 2:50 PM, Michael Campbell <[hidden email]> wrote:

Hi Lahcen,

I put some answers below.
On Nov 15, 2017, at 11:32 AM, lahcen campbell <[hidden email]> wrote:

Hi Michael and Carson

Thank you both for your helpful input, I really appreciate it. 

See below for my comments...

Best
Lahcen


On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell <[hidden email]> wrote:
Hi Lancen,

Thanks, the name has served me well for a number of years now :)

Its a good name, I wouldn't change it haha :) 
 

So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one.

To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi.

One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). 

First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! 
Great

As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to  many times on failed contigs you can increase the number of tries in the opts file. The line looks like this:

tries=2 #number of times to try a contig if there is a failure for some reason

If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line:

Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. 

Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users.  
 I've got a script that will do it if you want it. Since you don't need all of the entries grep is probably as easy as anyting. grep -P '\tsource\t'

rm_gff= #pre-identified repeat elements from an external GFF3 file

I will remove links to fasta files for both 'rmlib=' and 'repeat_protein='
Yep

you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line:

protein_gff=  #aligned protein homology evidence from an external GFF3 file

Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input.

You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state.  
Right. blast will not be called as long as you remove or comment out the paths to the fastas in the est= and protein= lines.

I noticed that the only unique source field types in my MAKER GFF are as follows: 
augustus_masked 
blastx
maker
protein2genome
repeatmasker
repeatrunner
That look right for the run you described
I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? 

Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. 

#-----Re-annotation Using MAKER Derived GFF3
.....
est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no 

You don't need model_pass or pred_pass if you plan on running gene finders
I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. 
Yeah, when I have issues with failing contigs I'll pull stuff out until it starts running without error, then I add things back until something breaks.

Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? 
0 those options are just for generating gene models directly from evidence when you don't have any gene finders trained. When you say updating do you mean reusing evidence from previous runs and generating new gene annotations or are you taking existing gene models and adding new evidence to see if they can be improved?

Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! 
No Worries, hope it helps.

L

 
Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. 

I ccd the list on this response just Incas anyone else has any ideas or is facing the same error.

Let me know if any of this helps,
Mike

On Nov 14, 2017, at 10:48 AM, lahcen campbell <[hidden email]> wrote:

Hi Michael 

Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. 

I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. 

Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage.  


Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... 

I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly.

MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. 

Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? 

Your help is very much appreciated Michael so thank you, 
Best
L




On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell <[hidden email]> wrote:
Hi Lahcen,

Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it.

Thanks,
Mike
On Nov 14, 2017, at 7:15 AM, lahcen campbell <[hidden email]> wrote:

Hi MAKER community,

I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3

Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. 

Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. 

I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. 

I have only edited the maker_opts file and changed:

genome=
protein=
protein2genome=1

But see attached my maker CTL files. 

The error consistently returned to me:

Skipping the contig because it is too short!!
SeqID: contig_WHATEVER
Length: 0

The sequences are no where near too short. This was verified independently outside maker to be sure. 

The headers are as follows:

>tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no

I have just about given up, I have no idea why its happening it makes zero sense. 

Any help or information as to why this might be happening would be amazing. 

Thank you in advance. 
Lahcen

--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================
<maker_bopts.ctl><maker_exe.ctl><maker_opts.ctl>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org




--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================




--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org