MAKER RepeatRunner error on long scaffolds only

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

MAKER RepeatRunner error on long scaffolds only

Daren C. Card
Hi all,

I’ve been having an issue with MAKER (v. 2.31.8) that I haven’t been able to overcome, and no former questions have really addressed or helped fix the problem. I’ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size?

I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can’t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I’m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library).

Any help would be greatly appreciated.
Daren Card

University of Texas Arlington

###################################################
doing blastx repeats
running  blast search.
#--------- command -------------#
Widget::blastx:
/usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner
#-------------------------------#
deleted:0 hits
collecting blastx repeatmasking
processing all repeats
in cluster::shadow_cluster...
Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
--> rank=3, hostname=moonunit0
ERROR: Failed while processing all repeats
ERROR: Chunk failed at level:3, tier_type:1
FAILED CONTIG:scaffold-1

doing blastx repeats
running  blast search.
#--------- command -------------#
Widget::blastx:
/usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner
#-------------------------------#
ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:scaffold-1

deleted:0 hits
deleted:0 hits
###################################################


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: MAKER RepeatRunner error on long scaffolds only

Carson Holt-2
The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago.

—Carson


> On Oct 4, 2017, at 9:53 AM, Daren C. Card <[hidden email]> wrote:
>
> Hi all,
>
> I’ve been having an issue with MAKER (v. 2.31.8) that I haven’t been able to overcome, and no former questions have really addressed or helped fix the problem. I’ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size?
>
> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can’t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I’m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library).
>
> Any help would be greatly appreciated.
> Daren Card
>
> University of Texas Arlington
>
> ###################################################
> doing blastx repeats
> running  blast search.
> #--------- command -------------#
> Widget::blastx:
> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner
> #-------------------------------#
> deleted:0 hits
> collecting blastx repeatmasking
> processing all repeats
> in cluster::shadow_cluster...
> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
> --> rank=3, hostname=moonunit0
> ERROR: Failed while processing all repeats
> ERROR: Chunk failed at level:3, tier_type:1
> FAILED CONTIG:scaffold-1
>
> doing blastx repeats
> running  blast search.
> #--------- command -------------#
> Widget::blastx:
> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner
> #-------------------------------#
> ERROR: Chunk failed at level:2, tier_type:0
> FAILED CONTIG:scaffold-1
>
> deleted:0 hits
> deleted:0 hits
> ###################################################
>
>
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: MAKER RepeatRunner error on long scaffolds only

Daren C. Card
Dear Carson,

Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH.

Unfortunately, I’m still getting the same error what appears to be at roughly the same spot (~child 226). I’ve copied the stderr below. I checked my GFF file and I don’t see any issues with coordinates. I’m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into.

Thank you,
Daren Card


################################################
doing repeat masking
re reading repeat masker report.
/home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out
doing blastx repeats
re reading blast report.
/home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner
deleted:2 hits
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
in cluster::shadow_cluster...
Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
--> rank=NA, hostname=moonunit0
ERROR: Failed while processing all repeats
ERROR: Chunk failed at level:3, tier_type:1
FAILED CONTIG:scaffold-1

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:scaffold-1

examining contents of the fasta file and run log
################################################



> On Oct 4, 2017, at 11:03 AM, Carson Holt <[hidden email]> wrote:
>
> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago.
>
> —Carson
>
>
>> On Oct 4, 2017, at 9:53 AM, Daren C. Card <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I’ve been having an issue with MAKER (v. 2.31.8) that I haven’t been able to overcome, and no former questions have really addressed or helped fix the problem. I’ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size?
>>
>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can’t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I’m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library).
>>
>> Any help would be greatly appreciated.
>> Daren Card
>>
>> University of Texas Arlington
>>
>> ###################################################
>> doing blastx repeats
>> running  blast search.
>> #--------- command -------------#
>> Widget::blastx:
>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner
>> #-------------------------------#
>> deleted:0 hits
>> collecting blastx repeatmasking
>> processing all repeats
>> in cluster::shadow_cluster...
>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
>> --> rank=3, hostname=moonunit0
>> ERROR: Failed while processing all repeats
>> ERROR: Chunk failed at level:3, tier_type:1
>> FAILED CONTIG:scaffold-1
>>
>> doing blastx repeats
>> running  blast search.
>> #--------- command -------------#
>> Widget::blastx:
>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner
>> #-------------------------------#
>> ERROR: Chunk failed at level:2, tier_type:0
>> FAILED CONTIG:scaffold-1
>>
>> deleted:0 hits
>> deleted:0 hits
>> ###################################################
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> [hidden email]
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: MAKER RepeatRunner error on long scaffolds only

Carson Holt-2
MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST).

The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here —> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/

—Carson





On Oct 6, 2017, at 6:23 AM, Daren C. Card <[hidden email]> wrote:

Dear Carson,

Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH.

Unfortunately, I’m still getting the same error what appears to be at roughly the same spot (~child 226). I’ve copied the stderr below. I checked my GFF file and I don’t see any issues with coordinates. I’m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into.

Thank you,
Daren Card


################################################
doing repeat masking
re reading repeat masker report.
/home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out
doing blastx repeats
re reading blast report.
/home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner
deleted:2 hits
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
in cluster::shadow_cluster...
Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
--> rank=NA, hostname=moonunit0
ERROR: Failed while processing all repeats
ERROR: Chunk failed at level:3, tier_type:1
FAILED CONTIG:scaffold-1

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:scaffold-1

examining contents of the fasta file and run log
################################################



On Oct 4, 2017, at 11:03 AM, Carson Holt <[hidden email]> wrote:

The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago.

—Carson


On Oct 4, 2017, at 9:53 AM, Daren C. Card <[hidden email]> wrote:

Hi all,

I’ve been having an issue with MAKER (v. 2.31.8) that I haven’t been able to overcome, and no former questions have really addressed or helped fix the problem. I’ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size?

I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can’t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I’m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library).

Any help would be greatly appreciated.
Daren Card

University of Texas Arlington

###################################################
doing blastx repeats
running  blast search.
#--------- command -------------#
Widget::blastx:
/usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner
#-------------------------------#
deleted:0 hits
collecting blastx repeatmasking
processing all repeats
in cluster::shadow_cluster...
Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
--> rank=3, hostname=moonunit0
ERROR: Failed while processing all repeats
ERROR: Chunk failed at level:3, tier_type:1
FAILED CONTIG:scaffold-1

doing blastx repeats
running  blast search.
#--------- command -------------#
Widget::blastx:
/usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner
#-------------------------------#
ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:scaffold-1

deleted:0 hits
deleted:0 hits
###################################################


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: MAKER RepeatRunner error on long scaffolds only

Daren C. Card
Hi Carson,

Thanks for the help. Issue is still lingering. I’ve tried my full ‘ideal’ run using both the BLAST legacy 2.2.26 and also 2.6 and get the same error, so doesn’t seem to be a BLAST issue. Or is one that won’t be easy to overcome.

Using BLAST v. 2.6, I tried some more runs turning off RepeatRunner or excluding the complex repeat GFF I’m trying to supply. Seems to be running fine without my GFF, which indicates to me that the issue is this file and not BLAST. Disclaimer: I didn’t run the entire scaffold since it is quite large, but it went well past the point at which it was otherwise failing which leads me to believe it would finish okay.

I validated the GFF at http://genometools.org/cgi-bin/gff3validator.cgi. I had previously had <10 negative start coordinates for the repeat coordinates in the attributes field of the GFF, which I just set to 1 to give a clean GFF. This was what I used for the runs I described above, so whatever issue there is with this GFF is a mystery to me.

What advice do you have for further troubleshooting to try to determine what part of the GFF is causing the issue? I don’t see any obvious way info about how the sequence or the GFF is partitioned up for the annotation among the output files produced, so any help you can provide would be great.

Hoping I can resolve this as maybe this is useful to others. Weird that I’m getting this error, as I’ve annotated several other genomes in a similar manner and never had this issue. They were less contiguous, but can’t imagine that really mattering.

Thanks,
Daren


> On Oct 8, 2017, at 7:37 PM, Carson Holt <[hidden email]> wrote:
>
> MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST).
>
> The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here —> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/
>
> —Carson
>
>
>
>
>
>> On Oct 6, 2017, at 6:23 AM, Daren C. Card <[hidden email]> wrote:
>>
>> Dear Carson,
>>
>> Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH.
>>
>> Unfortunately, I’m still getting the same error what appears to be at roughly the same spot (~child 226). I’ve copied the stderr below. I checked my GFF file and I don’t see any issues with coordinates. I’m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into.
>>
>> Thank you,
>> Daren Card
>>
>>
>> ################################################
>> doing repeat masking
>> re reading repeat masker report.
>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out
>> doing blastx repeats
>> re reading blast report.
>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner
>> deleted:2 hits
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> collecting blastx repeatmasking
>> processing all repeats
>> in cluster::shadow_cluster...
>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
>> --> rank=NA, hostname=moonunit0
>> ERROR: Failed while processing all repeats
>> ERROR: Chunk failed at level:3, tier_type:1
>> FAILED CONTIG:scaffold-1
>>
>> ERROR: Chunk failed at level:2, tier_type:0
>> FAILED CONTIG:scaffold-1
>>
>> examining contents of the fasta file and run log
>> ################################################
>>
>>
>>
>>> On Oct 4, 2017, at 11:03 AM, Carson Holt <[hidden email]> wrote:
>>>
>>> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago.
>>>
>>> —Carson
>>>
>>>
>>>> On Oct 4, 2017, at 9:53 AM, Daren C. Card <[hidden email]> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I’ve been having an issue with MAKER (v. 2.31.8) that I haven’t been able to overcome, and no former questions have really addressed or helped fix the problem. I’ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size?
>>>>
>>>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can’t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I’m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library).
>>>>
>>>> Any help would be greatly appreciated.
>>>> Daren Card
>>>>
>>>> University of Texas Arlington
>>>>
>>>> ###################################################
>>>> doing blastx repeats
>>>> running  blast search.
>>>> #--------- command -------------#
>>>> Widget::blastx:
>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner
>>>> #-------------------------------#
>>>> deleted:0 hits
>>>> collecting blastx repeatmasking
>>>> processing all repeats
>>>> in cluster::shadow_cluster...
>>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
>>>> --> rank=3, hostname=moonunit0
>>>> ERROR: Failed while processing all repeats
>>>> ERROR: Chunk failed at level:3, tier_type:1
>>>> FAILED CONTIG:scaffold-1
>>>>
>>>> doing blastx repeats
>>>> running  blast search.
>>>> #--------- command -------------#
>>>> Widget::blastx:
>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner
>>>> #-------------------------------#
>>>> ERROR: Chunk failed at level:2, tier_type:0
>>>> FAILED CONTIG:scaffold-1
>>>>
>>>> deleted:0 hits
>>>> deleted:0 hits
>>>> ###################################################
>>>>
>>>>
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> [hidden email]
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>
>


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: MAKER RepeatRunner error on long scaffolds only

Carson Holt-2
So you have an input GFF3 file? Could you send it to me along with the problem contig. If you want you can upload the maker control files and evidence sets, and I can just recreate the run for the contig.


—Carson



On Oct 12, 2017, at 8:22 PM, Daren C. Card <[hidden email]> wrote:

Hi Carson,

Thanks for the help. Issue is still lingering. I’ve tried my full ‘ideal’ run using both the BLAST legacy 2.2.26 and also 2.6 and get the same error, so doesn’t seem to be a BLAST issue. Or is one that won’t be easy to overcome.

Using BLAST v. 2.6, I tried some more runs turning off RepeatRunner or excluding the complex repeat GFF I’m trying to supply. Seems to be running fine without my GFF, which indicates to me that the issue is this file and not BLAST. Disclaimer: I didn’t run the entire scaffold since it is quite large, but it went well past the point at which it was otherwise failing which leads me to believe it would finish okay.

I validated the GFF at http://genometools.org/cgi-bin/gff3validator.cgi. I had previously had <10 negative start coordinates for the repeat coordinates in the attributes field of the GFF, which I just set to 1 to give a clean GFF. This was what I used for the runs I described above, so whatever issue there is with this GFF is a mystery to me.

What advice do you have for further troubleshooting to try to determine what part of the GFF is causing the issue? I don’t see any obvious way info about how the sequence or the GFF is partitioned up for the annotation among the output files produced, so any help you can provide would be great.

Hoping I can resolve this as maybe this is useful to others. Weird that I’m getting this error, as I’ve annotated several other genomes in a similar manner and never had this issue. They were less contiguous, but can’t imagine that really mattering.

Thanks,
Daren


On Oct 8, 2017, at 7:37 PM, Carson Holt <[hidden email]> wrote:

MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST).

The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here —> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/

—Carson





On Oct 6, 2017, at 6:23 AM, Daren C. Card <[hidden email]> wrote:

Dear Carson,

Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH.

Unfortunately, I’m still getting the same error what appears to be at roughly the same spot (~child 226). I’ve copied the stderr below. I checked my GFF file and I don’t see any issues with coordinates. I’m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into.

Thank you,
Daren Card


################################################
doing repeat masking
re reading repeat masker report.
/home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out
doing blastx repeats
re reading blast report.
/home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner
deleted:2 hits
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
in cluster::shadow_cluster...
Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
--> rank=NA, hostname=moonunit0
ERROR: Failed while processing all repeats
ERROR: Chunk failed at level:3, tier_type:1
FAILED CONTIG:scaffold-1

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:scaffold-1

examining contents of the fasta file and run log
################################################



On Oct 4, 2017, at 11:03 AM, Carson Holt <[hidden email]> wrote:

The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago.

—Carson


On Oct 4, 2017, at 9:53 AM, Daren C. Card <[hidden email]> wrote:

Hi all,

I’ve been having an issue with MAKER (v. 2.31.8) that I haven’t been able to overcome, and no former questions have really addressed or helped fix the problem. I’ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size?

I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can’t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I’m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library).

Any help would be greatly appreciated.
Daren Card

University of Texas Arlington

###################################################
doing blastx repeats
running  blast search.
#--------- command -------------#
Widget::blastx:
/usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner
#-------------------------------#
deleted:0 hits
collecting blastx repeatmasking
processing all repeats
in cluster::shadow_cluster...
Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
--> rank=3, hostname=moonunit0
ERROR: Failed while processing all repeats
ERROR: Chunk failed at level:3, tier_type:1
FAILED CONTIG:scaffold-1

doing blastx repeats
running  blast search.
#--------- command -------------#
Widget::blastx:
/usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner
#-------------------------------#
ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:scaffold-1

deleted:0 hits
deleted:0 hits
###################################################


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org






_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org