Some errors reported by Maker2

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Some errors reported by Maker2

Quanwei Zhang
Hello:

We are doing genome annotation for a new rodent species. We have finished the training of the ab initio gene predictors successful by setting the following parameters (split_hit=40000, max_dna_len=1000000, and 99k mammalian Swiss protein sequences as evidences.

But when I used the trained model to do the genome annotation, I got the following kinds of errors (shown in red). I used the same parameters as those for training, except for addition of 340k rodent TrEMBL protein sequences for protein evidences (i.e., I use both 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences).

I am doing the annotation on a cluster and started multiple Maker in the same directory (I had tried to use MPI but met some problems). 

Do you have any suggestions? Many thanks
#some kinds of errors
open3: fork failed: Cannot allocate memory at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Widget/blastx.pm line 40.
--> rank=NA, hostname=n520
ERROR: Failed while doing blastx of proteins
ERROR: Chunk failed at level:8, tier_type:3
FAILED CONTIG:Contig2


setting up GFF3 output and fasta chunks
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n513
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig12378


Best
Quanwei

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Carson Holt-2
You ran out of memory. You probably set max_dna_len too high for the machines you are using. There is a note in the maker_opts.ctl file that tells you that this value affects memory usage.

So you can either set it lower, or if running under MPI, use fewer CPUs per node (how you do this is MPI flavor dependent, but some flavors let you do this by setting process count lower combined with the round robin option).

—Carson


On Sep 5, 2017, at 2:24 PM, Quanwei Zhang <[hidden email]> wrote:

Hello:

We are doing genome annotation for a new rodent species. We have finished the training of the ab initio gene predictors successful by setting the following parameters (split_hit=40000, max_dna_len=1000000, and 99k mammalian Swiss protein sequences as evidences.

But when I used the trained model to do the genome annotation, I got the following kinds of errors (shown in red). I used the same parameters as those for training, except for addition of 340k rodent TrEMBL protein sequences for protein evidences (i.e., I use both 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences).

I am doing the annotation on a cluster and started multiple Maker in the same directory (I had tried to use MPI but met some problems). 

Do you have any suggestions? Many thanks
#some kinds of errors
open3: fork failed: Cannot allocate memory at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Widget/blastx.pm line 40.
--> rank=NA, hostname=n520
ERROR: Failed while doing blastx of proteins
ERROR: Chunk failed at level:8, tier_type:3
FAILED CONTIG:Contig2


setting up GFF3 output and fasta chunks
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n513
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig12378


Best
Quanwei


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Quanwei Zhang
Dear Carson:

Thanks. I wonder whether smaller "max_dna_len" will split longer scaffolds. I set max_dna_len as 1Mb, because there are quite many long scaffolds (e.g., the longest one is about 100Mb). Would you explain whether smaller "max_dna_len" will decrease the quality of annotation (e.g., split some genes in the same scaffold)?


Best
Quanwei  

2017-09-05 17:48 GMT-04:00 Carson Holt <[hidden email]>:
You ran out of memory. You probably set max_dna_len too high for the machines you are using. There is a note in the maker_opts.ctl file that tells you that this value affects memory usage.

So you can either set it lower, or if running under MPI, use fewer CPUs per node (how you do this is MPI flavor dependent, but some flavors let you do this by setting process count lower combined with the round robin option).

—Carson



On Sep 5, 2017, at 2:24 PM, Quanwei Zhang <[hidden email]> wrote:

Hello:

We are doing genome annotation for a new rodent species. We have finished the training of the ab initio gene predictors successful by setting the following parameters (split_hit=40000, max_dna_len=1000000, and 99k mammalian Swiss protein sequences as evidences.

But when I used the trained model to do the genome annotation, I got the following kinds of errors (shown in red). I used the same parameters as those for training, except for addition of 340k rodent TrEMBL protein sequences for protein evidences (i.e., I use both 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences).

I am doing the annotation on a cluster and started multiple Maker in the same directory (I had tried to use MPI but met some problems). 

Do you have any suggestions? Many thanks
#some kinds of errors
open3: fork failed: Cannot allocate memory at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Widget/blastx.pm line 40.
--> rank=NA, hostname=n520
ERROR: Failed while doing blastx of proteins
ERROR: Chunk failed at level:8, tier_type:3
FAILED CONTIG:Contig2


setting up GFF3 output and fasta chunks
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n513
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig12378


Best
Quanwei



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Carson Holt-2
max_dna_len is the window size for keeping data in RAM. Smaller values do not split genes. But values lower than 100kb can create issues (if a single gene models spans 3 or more windows, it creates a weird failure).

—Carson



On Sep 5, 2017, at 4:04 PM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

Thanks. I wonder whether smaller "max_dna_len" will split longer scaffolds. I set max_dna_len as 1Mb, because there are quite many long scaffolds (e.g., the longest one is about 100Mb). Would you explain whether smaller "max_dna_len" will decrease the quality of annotation (e.g., split some genes in the same scaffold)?


Best
Quanwei  

2017-09-05 17:48 GMT-04:00 Carson Holt <[hidden email]>:
You ran out of memory. You probably set max_dna_len too high for the machines you are using. There is a note in the maker_opts.ctl file that tells you that this value affects memory usage.

So you can either set it lower, or if running under MPI, use fewer CPUs per node (how you do this is MPI flavor dependent, but some flavors let you do this by setting process count lower combined with the round robin option).

—Carson



On Sep 5, 2017, at 2:24 PM, Quanwei Zhang <[hidden email]> wrote:

Hello:

We are doing genome annotation for a new rodent species. We have finished the training of the ab initio gene predictors successful by setting the following parameters (split_hit=40000, max_dna_len=1000000, and 99k mammalian Swiss protein sequences as evidences.

But when I used the trained model to do the genome annotation, I got the following kinds of errors (shown in red). I used the same parameters as those for training, except for addition of 340k rodent TrEMBL protein sequences for protein evidences (i.e., I use both 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences).

I am doing the annotation on a cluster and started multiple Maker in the same directory (I had tried to use MPI but met some problems). 

Do you have any suggestions? Many thanks
#some kinds of errors
open3: fork failed: Cannot allocate memory at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Widget/blastx.pm line 40.
--> rank=NA, hostname=n520
ERROR: Failed while doing blastx of proteins
ERROR: Chunk failed at level:8, tier_type:3
FAILED CONTIG:Contig2


setting up GFF3 output and fasta chunks
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n513
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig12378


Best
Quanwei




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Quanwei Zhang
Dear Carson:

(1) Thank you for your explanation. I will try to set max_dna_len as 400kb for our rodent species, which is a little bit higher than the suggested value for large vertebrate genome (in the maker manual it mentioned "300,000 is a good max_dna_len on large vertebrate genomes if memory is not a limiting factor").

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right? 

**************** the bash file used to submit the maker job
#!/bin/bash

#$ -cwd
#$ -S /bin/bash
#$ -j y
#$ -N makerT2
#$ -l h_vmem=8g
#$ -pe smp 2

module load MAKER/2.31.9/perl.5.22.1

maker --q 2> maker_test.error



Many thanks

Best
Qaunwei


2017-09-05 18:08 GMT-04:00 Carson Holt <[hidden email]>:
max_dna_len is the window size for keeping data in RAM. Smaller values do not split genes. But values lower than 100kb can create issues (if a single gene models spans 3 or more windows, it creates a weird failure).

—Carson




On Sep 5, 2017, at 4:04 PM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

Thanks. I wonder whether smaller "max_dna_len" will split longer scaffolds. I set max_dna_len as 1Mb, because there are quite many long scaffolds (e.g., the longest one is about 100Mb). Would you explain whether smaller "max_dna_len" will decrease the quality of annotation (e.g., split some genes in the same scaffold)?


Best
Quanwei  

2017-09-05 17:48 GMT-04:00 Carson Holt <[hidden email]>:
You ran out of memory. You probably set max_dna_len too high for the machines you are using. There is a note in the maker_opts.ctl file that tells you that this value affects memory usage.

So you can either set it lower, or if running under MPI, use fewer CPUs per node (how you do this is MPI flavor dependent, but some flavors let you do this by setting process count lower combined with the round robin option).

—Carson



On Sep 5, 2017, at 2:24 PM, Quanwei Zhang <[hidden email]> wrote:

Hello:

We are doing genome annotation for a new rodent species. We have finished the training of the ab initio gene predictors successful by setting the following parameters (split_hit=40000, max_dna_len=1000000, and 99k mammalian Swiss protein sequences as evidences.

But when I used the trained model to do the genome annotation, I got the following kinds of errors (shown in red). I used the same parameters as those for training, except for addition of 340k rodent TrEMBL protein sequences for protein evidences (i.e., I use both 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences).

I am doing the annotation on a cluster and started multiple Maker in the same directory (I had tried to use MPI but met some problems). 

Do you have any suggestions? Many thanks
#some kinds of errors
open3: fork failed: Cannot allocate memory at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Widget/blastx.pm line 40.
--> rank=NA, hostname=n520
ERROR: Failed while doing blastx of proteins
ERROR: Chunk failed at level:8, tier_type:3
FAILED CONTIG:Contig2


setting up GFF3 output and fasta chunks
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n513
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig12378


Best
Quanwei





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Carson Holt-2

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Quanwei Zhang
Dear Carson:

I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set  max_dna_len=1Mb, I can train the model successfully.  And in the current training (where I get the following error),  I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the depth_blast as 30 in current training.

Thank you! Have a nice weekend! 



#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Contig10
Length: 18773588
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n224
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig10

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Contig10

Best
Quanwei

2017-09-06 12:06 GMT-04:00 Carson Holt <[hidden email]>:

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Quanwei Zhang
Dear Carson:

About the error in my above email, I found the contig was correctly annotated at the second time RETRY. So please ignore my last email. But now, for a few number of scaffolds, I met problems to process the repeats (as shown below in red). I used both Mammalia repeat library and species specific repeat library (which is generated by your pipeline "http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic"). There were no such problems when I only used Mammalia repeat library. Do you have any ideas about this? What could be the reason? Or do you have any suggestions for me to find the reason? Many thanks 

Here are some parameters I used

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Mammalia #select a model organism for RepBase masking in RepeatMasker
rmlib=../consensi.fa.classifiednoProtFinal #provide an organism specific repeat library in fasta format for Repe

max_dna_len=300000
split_hit=40000
depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking


Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
33708 --> rank=NA, hostname=n409
33709 ERROR: Failed while processing all repeats
33710 ERROR: Chunk failed at level:3, tier_type:1
33711 FAILED CONTIG:Contig31



Best
Quanwei

2017-09-08 23:25 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set  max_dna_len=1Mb, I can train the model successfully.  And in the current training (where I get the following error),  I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the depth_blast as 30 in current training.

Thank you! Have a nice weekend! 



#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Contig10
Length: 18773588
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n224
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig10

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Contig10

Best
Quanwei

2017-09-06 12:06 GMT-04:00 Carson Holt <[hidden email]>:

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Carson Holt-2
In reply to this post by Quanwei Zhang
It may can a memory issue or an IO issue. Some resource is being taxed and creating a non-responsive bottleneck. If you are running MAKER multiple times in the same directory, you may have to run fewer processes. Also if you are running without MPI, run with MPI instead as it will better manage the parallelization and use fewer resources than multiple individual processes.

—Carson


On Sep 8, 2017, at 9:25 PM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set  max_dna_len=1Mb, I can train the model successfully.  And in the current training (where I get the following error),  I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the depth_blast as 30 in current training.

Thank you! Have a nice weekend! 



#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Contig10
Length: 18773588
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n224
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig10

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Contig10

Best
Quanwei

2017-09-06 12:06 GMT-04:00 Carson Holt <[hidden email]>:

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Carson Holt-2
In reply to this post by Quanwei Zhang
I think the cause of the error may have been a little further upstream from what you pasted in the e-mail. One thing that may be happening is that you are taxing resources (like IO) if running MAKER multiple times or on too many CPUs. That can lead to failures because of truncated BLAST reports etc. In which case you can just retry and that will get around those types of IO derived errors. MAKER can generate a lot of IO, and if you are working on network mounted locations (i.e. the storage being used is actually across the network), then they can be lest robust than local storage (when under heavy load NFS can falsely report success on read/write operations that actually failed). It’s the reason we built in the retry capabilities of MAKER.

For contigs that continuously fail, you may need to set clean_try=1. That will cause failures to start from scratch (i.e. delete all old reports on failure rather than just those suspected of being truncated).

—Carson


On Sep 11, 2017, at 10:19 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

About the error in my above email, I found the contig was correctly annotated at the second time RETRY. So please ignore my last email. But now, for a few number of scaffolds, I met problems to process the repeats (as shown below in red). I used both Mammalia repeat library and species specific repeat library (which is generated by your pipeline "http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic"). There were no such problems when I only used Mammalia repeat library. Do you have any ideas about this? What could be the reason? Or do you have any suggestions for me to find the reason? Many thanks 

Here are some parameters I used

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Mammalia #select a model organism for RepBase masking in RepeatMasker
rmlib=../consensi.fa.classifiednoProtFinal #provide an organism specific repeat library in fasta format for Repe

max_dna_len=300000
split_hit=40000
depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking


Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
33708 --> rank=NA, hostname=n409
33709 ERROR: Failed while processing all repeats
33710 ERROR: Chunk failed at level:3, tier_type:1
33711 FAILED CONTIG:Contig31



Best
Quanwei

2017-09-08 23:25 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set  max_dna_len=1Mb, I can train the model successfully.  And in the current training (where I get the following error),  I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the depth_blast as 30 in current training.

Thank you! Have a nice weekend! 



#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Contig10
Length: 18773588
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n224
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig10

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Contig10

Best
Quanwei

2017-09-06 12:06 GMT-04:00 Carson Holt <[hidden email]>:

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Quanwei Zhang
Dear Carson:

I only run 5 Maker instances in each directory (and set cpus=2). If it is related to memory issue or an IO issue, I am not sure why the much longer scaffolds (than the failed ones) were all annotated successfully, but the relatively shorter ones failed. 

I have set "tries=5" (#number of times to try a contig if there is a failure for some reason). I will try "clean_try=1" and test on the failed scaffolds individually with larger memory to see whether they can be annotated.

Thank you!

Best
Quanwei

2017-09-11 13:07 GMT-04:00 Carson Holt <[hidden email]>:
I think the cause of the error may have been a little further upstream from what you pasted in the e-mail. One thing that may be happening is that you are taxing resources (like IO) if running MAKER multiple times or on too many CPUs. That can lead to failures because of truncated BLAST reports etc. In which case you can just retry and that will get around those types of IO derived errors. MAKER can generate a lot of IO, and if you are working on network mounted locations (i.e. the storage being used is actually across the network), then they can be lest robust than local storage (when under heavy load NFS can falsely report success on read/write operations that actually failed). It’s the reason we built in the retry capabilities of MAKER.

For contigs that continuously fail, you may need to set clean_try=1. That will cause failures to start from scratch (i.e. delete all old reports on failure rather than just those suspected of being truncated).

—Carson



On Sep 11, 2017, at 10:19 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

About the error in my above email, I found the contig was correctly annotated at the second time RETRY. So please ignore my last email. But now, for a few number of scaffolds, I met problems to process the repeats (as shown below in red). I used both Mammalia repeat library and species specific repeat library (which is generated by your pipeline "http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic"). There were no such problems when I only used Mammalia repeat library. Do you have any ideas about this? What could be the reason? Or do you have any suggestions for me to find the reason? Many thanks 

Here are some parameters I used

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Mammalia #select a model organism for RepBase masking in RepeatMasker
rmlib=../consensi.fa.classifiednoProtFinal #provide an organism specific repeat library in fasta format for Repe

max_dna_len=300000
split_hit=40000
depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking


Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
33708 --> rank=NA, hostname=n409
33709 ERROR: Failed while processing all repeats
33710 ERROR: Chunk failed at level:3, tier_type:1
33711 FAILED CONTIG:Contig31



Best
Quanwei

2017-09-08 23:25 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set  max_dna_len=1Mb, I can train the model successfully.  And in the current training (where I get the following error),  I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the depth_blast as 30 in current training.

Thank you! Have a nice weekend! 



#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Contig10
Length: 18773588
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n224
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig10

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Contig10

Best
Quanwei

2017-09-06 12:06 GMT-04:00 Carson Holt <[hidden email]>:

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Carson Holt-2
It could be either. Please use MPI instead of starting multiple instances. It will greatly reduce both IO and RAM usage.

—Carson



On Sep 11, 2017, at 11:12 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I only run 5 Maker instances in each directory (and set cpus=2). If it is related to memory issue or an IO issue, I am not sure why the much longer scaffolds (than the failed ones) were all annotated successfully, but the relatively shorter ones failed. 

I have set "tries=5" (#number of times to try a contig if there is a failure for some reason). I will try "clean_try=1" and test on the failed scaffolds individually with larger memory to see whether they can be annotated.

Thank you!

Best
Quanwei

2017-09-11 13:07 GMT-04:00 Carson Holt <[hidden email]>:
I think the cause of the error may have been a little further upstream from what you pasted in the e-mail. One thing that may be happening is that you are taxing resources (like IO) if running MAKER multiple times or on too many CPUs. That can lead to failures because of truncated BLAST reports etc. In which case you can just retry and that will get around those types of IO derived errors. MAKER can generate a lot of IO, and if you are working on network mounted locations (i.e. the storage being used is actually across the network), then they can be lest robust than local storage (when under heavy load NFS can falsely report success on read/write operations that actually failed). It’s the reason we built in the retry capabilities of MAKER.

For contigs that continuously fail, you may need to set clean_try=1. That will cause failures to start from scratch (i.e. delete all old reports on failure rather than just those suspected of being truncated).

—Carson



On Sep 11, 2017, at 10:19 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

About the error in my above email, I found the contig was correctly annotated at the second time RETRY. So please ignore my last email. But now, for a few number of scaffolds, I met problems to process the repeats (as shown below in red). I used both Mammalia repeat library and species specific repeat library (which is generated by your pipeline "http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic"). There were no such problems when I only used Mammalia repeat library. Do you have any ideas about this? What could be the reason? Or do you have any suggestions for me to find the reason? Many thanks 

Here are some parameters I used

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Mammalia #select a model organism for RepBase masking in RepeatMasker
rmlib=../consensi.fa.classifiednoProtFinal #provide an organism specific repeat library in fasta format for Repe

max_dna_len=300000
split_hit=40000
depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking


Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
33708 --> rank=NA, hostname=n409
33709 ERROR: Failed while processing all repeats
33710 ERROR: Chunk failed at level:3, tier_type:1
33711 FAILED CONTIG:Contig31



Best
Quanwei

2017-09-08 23:25 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set  max_dna_len=1Mb, I can train the model successfully.  And in the current training (where I get the following error),  I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the depth_blast as 30 in current training.

Thank you! Have a nice weekend! 



#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Contig10
Length: 18773588
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n224
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig10

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Contig10

Best
Quanwei

2017-09-06 12:06 GMT-04:00 Carson Holt <[hidden email]>:

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson






_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Quanwei Zhang
Dear Carson:

I met some problems to use MPI. I will give it another try.
Thank you!

Best
Quanwei

2017-09-11 13:14 GMT-04:00 Carson Holt <[hidden email]>:
It could be either. Please use MPI instead of starting multiple instances. It will greatly reduce both IO and RAM usage.

—Carson



On Sep 11, 2017, at 11:12 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I only run 5 Maker instances in each directory (and set cpus=2). If it is related to memory issue or an IO issue, I am not sure why the much longer scaffolds (than the failed ones) were all annotated successfully, but the relatively shorter ones failed. 

I have set "tries=5" (#number of times to try a contig if there is a failure for some reason). I will try "clean_try=1" and test on the failed scaffolds individually with larger memory to see whether they can be annotated.

Thank you!

Best
Quanwei

2017-09-11 13:07 GMT-04:00 Carson Holt <[hidden email]>:
I think the cause of the error may have been a little further upstream from what you pasted in the e-mail. One thing that may be happening is that you are taxing resources (like IO) if running MAKER multiple times or on too many CPUs. That can lead to failures because of truncated BLAST reports etc. In which case you can just retry and that will get around those types of IO derived errors. MAKER can generate a lot of IO, and if you are working on network mounted locations (i.e. the storage being used is actually across the network), then they can be lest robust than local storage (when under heavy load NFS can falsely report success on read/write operations that actually failed). It’s the reason we built in the retry capabilities of MAKER.

For contigs that continuously fail, you may need to set clean_try=1. That will cause failures to start from scratch (i.e. delete all old reports on failure rather than just those suspected of being truncated).

—Carson



On Sep 11, 2017, at 10:19 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

About the error in my above email, I found the contig was correctly annotated at the second time RETRY. So please ignore my last email. But now, for a few number of scaffolds, I met problems to process the repeats (as shown below in red). I used both Mammalia repeat library and species specific repeat library (which is generated by your pipeline "http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic"). There were no such problems when I only used Mammalia repeat library. Do you have any ideas about this? What could be the reason? Or do you have any suggestions for me to find the reason? Many thanks 

Here are some parameters I used

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Mammalia #select a model organism for RepBase masking in RepeatMasker
rmlib=../consensi.fa.classifiednoProtFinal #provide an organism specific repeat library in fasta format for Repe

max_dna_len=300000
split_hit=40000
depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking


Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
33708 --> rank=NA, hostname=n409
33709 ERROR: Failed while processing all repeats
33710 ERROR: Chunk failed at level:3, tier_type:1
33711 FAILED CONTIG:Contig31



Best
Quanwei

2017-09-08 23:25 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set  max_dna_len=1Mb, I can train the model successfully.  And in the current training (where I get the following error),  I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the depth_blast as 30 in current training.

Thank you! Have a nice weekend! 



#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Contig10
Length: 18773588
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n224
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig10

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Contig10

Best
Quanwei

2017-09-06 12:06 GMT-04:00 Carson Holt <[hidden email]>:

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson







_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Carson Holt-2
If you are just using a single machine (and not cross machine MPI), use MPICH3 —> https://www.mpich.org

It’s easy to install yourself, and tends to be very robust to failure.

—Carson



On Sep 11, 2017, at 11:16 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I met some problems to use MPI. I will give it another try.
Thank you!

Best
Quanwei

2017-09-11 13:14 GMT-04:00 Carson Holt <[hidden email]>:
It could be either. Please use MPI instead of starting multiple instances. It will greatly reduce both IO and RAM usage.

—Carson



On Sep 11, 2017, at 11:12 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I only run 5 Maker instances in each directory (and set cpus=2). If it is related to memory issue or an IO issue, I am not sure why the much longer scaffolds (than the failed ones) were all annotated successfully, but the relatively shorter ones failed. 

I have set "tries=5" (#number of times to try a contig if there is a failure for some reason). I will try "clean_try=1" and test on the failed scaffolds individually with larger memory to see whether they can be annotated.

Thank you!

Best
Quanwei

2017-09-11 13:07 GMT-04:00 Carson Holt <[hidden email]>:
I think the cause of the error may have been a little further upstream from what you pasted in the e-mail. One thing that may be happening is that you are taxing resources (like IO) if running MAKER multiple times or on too many CPUs. That can lead to failures because of truncated BLAST reports etc. In which case you can just retry and that will get around those types of IO derived errors. MAKER can generate a lot of IO, and if you are working on network mounted locations (i.e. the storage being used is actually across the network), then they can be lest robust than local storage (when under heavy load NFS can falsely report success on read/write operations that actually failed). It’s the reason we built in the retry capabilities of MAKER.

For contigs that continuously fail, you may need to set clean_try=1. That will cause failures to start from scratch (i.e. delete all old reports on failure rather than just those suspected of being truncated).

—Carson



On Sep 11, 2017, at 10:19 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

About the error in my above email, I found the contig was correctly annotated at the second time RETRY. So please ignore my last email. But now, for a few number of scaffolds, I met problems to process the repeats (as shown below in red). I used both Mammalia repeat library and species specific repeat library (which is generated by your pipeline "http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic"). There were no such problems when I only used Mammalia repeat library. Do you have any ideas about this? What could be the reason? Or do you have any suggestions for me to find the reason? Many thanks 

Here are some parameters I used

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Mammalia #select a model organism for RepBase masking in RepeatMasker
rmlib=../consensi.fa.classifiednoProtFinal #provide an organism specific repeat library in fasta format for Repe

max_dna_len=300000
split_hit=40000
depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking


Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
33708 --> rank=NA, hostname=n409
33709 ERROR: Failed while processing all repeats
33710 ERROR: Chunk failed at level:3, tier_type:1
33711 FAILED CONTIG:Contig31



Best
Quanwei

2017-09-08 23:25 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set  max_dna_len=1Mb, I can train the model successfully.  And in the current training (where I get the following error),  I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the depth_blast as 30 in current training.

Thank you! Have a nice weekend! 



#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Contig10
Length: 18773588
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n224
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig10

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Contig10

Best
Quanwei

2017-09-06 12:06 GMT-04:00 Carson Holt <[hidden email]>:

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson








_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Quanwei Zhang
Dear Carson:

Would you please explain what do you mean by "a single machine"? I am running maker2 on our high performance cluster. The cluster has more than 1,620-core compute nodes with 128 GB RAM each. Univa Grid Engine was used as the scheduler. Can I use MPICH3?

Thanks

Best
Quanwei

2017-09-11 13:18 GMT-04:00 Carson Holt <[hidden email]>:
If you are just using a single machine (and not cross machine MPI), use MPICH3 —> https://www.mpich.org

It’s easy to install yourself, and tends to be very robust to failure.

—Carson



On Sep 11, 2017, at 11:16 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I met some problems to use MPI. I will give it another try.
Thank you!

Best
Quanwei

2017-09-11 13:14 GMT-04:00 Carson Holt <[hidden email]>:
It could be either. Please use MPI instead of starting multiple instances. It will greatly reduce both IO and RAM usage.

—Carson



On Sep 11, 2017, at 11:12 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I only run 5 Maker instances in each directory (and set cpus=2). If it is related to memory issue or an IO issue, I am not sure why the much longer scaffolds (than the failed ones) were all annotated successfully, but the relatively shorter ones failed. 

I have set "tries=5" (#number of times to try a contig if there is a failure for some reason). I will try "clean_try=1" and test on the failed scaffolds individually with larger memory to see whether they can be annotated.

Thank you!

Best
Quanwei

2017-09-11 13:07 GMT-04:00 Carson Holt <[hidden email]>:
I think the cause of the error may have been a little further upstream from what you pasted in the e-mail. One thing that may be happening is that you are taxing resources (like IO) if running MAKER multiple times or on too many CPUs. That can lead to failures because of truncated BLAST reports etc. In which case you can just retry and that will get around those types of IO derived errors. MAKER can generate a lot of IO, and if you are working on network mounted locations (i.e. the storage being used is actually across the network), then they can be lest robust than local storage (when under heavy load NFS can falsely report success on read/write operations that actually failed). It’s the reason we built in the retry capabilities of MAKER.

For contigs that continuously fail, you may need to set clean_try=1. That will cause failures to start from scratch (i.e. delete all old reports on failure rather than just those suspected of being truncated).

—Carson



On Sep 11, 2017, at 10:19 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

About the error in my above email, I found the contig was correctly annotated at the second time RETRY. So please ignore my last email. But now, for a few number of scaffolds, I met problems to process the repeats (as shown below in red). I used both Mammalia repeat library and species specific repeat library (which is generated by your pipeline "http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic"). There were no such problems when I only used Mammalia repeat library. Do you have any ideas about this? What could be the reason? Or do you have any suggestions for me to find the reason? Many thanks 

Here are some parameters I used

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Mammalia #select a model organism for RepBase masking in RepeatMasker
rmlib=../consensi.fa.classifiednoProtFinal #provide an organism specific repeat library in fasta format for Repe

max_dna_len=300000
split_hit=40000
depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking


Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
33708 --> rank=NA, hostname=n409
33709 ERROR: Failed while processing all repeats
33710 ERROR: Chunk failed at level:3, tier_type:1
33711 FAILED CONTIG:Contig31



Best
Quanwei

2017-09-08 23:25 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set  max_dna_len=1Mb, I can train the model successfully.  And in the current training (where I get the following error),  I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the depth_blast as 30 in current training.

Thank you! Have a nice weekend! 



#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Contig10
Length: 18773588
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n224
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig10

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Contig10

Best
Quanwei

2017-09-06 12:06 GMT-04:00 Carson Holt <[hidden email]>:

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson









_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Carson Holt-2
Each node is a single machine. Because you currently run without MPI, each MAKER job you submit runs on a single machine. So you are either running multiple times on the same node, or you submitted 5 separate batch jobs in which case you may have a single maker process on each of 5 nodes.

MPI can parallelize on the same node or across nodes. If you request 10 nodes, then it can communicate across nodes to run the job on all hardware. Or you can run MPI on a single node and ask for all CPUs on that node. In that case it will split up work within a single node and use all resources just on that node. So if you can’t get MPI to work across nodes, you can just submit a job that goes to a single node and ask for all CPUs on that node (multinode jobs may be hard to configure, but single node jobs are very easy). Just set the -n parameter of mpiexec to the CPU count of that node, and it will parallelize within the node.

Example command for a 20 CPU node —>  mpiexec -n 20 maker

—Carson





On Sep 11, 2017, at 11:27 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

Would you please explain what do you mean by "a single machine"? I am running maker2 on our high performance cluster. The cluster has more than 1,620-core compute nodes with 128 GB RAM each. Univa Grid Engine was used as the scheduler. Can I use MPICH3?

Thanks

Best
Quanwei

2017-09-11 13:18 GMT-04:00 Carson Holt <[hidden email]>:
If you are just using a single machine (and not cross machine MPI), use MPICH3 —> https://www.mpich.org

It’s easy to install yourself, and tends to be very robust to failure.

—Carson



On Sep 11, 2017, at 11:16 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I met some problems to use MPI. I will give it another try.
Thank you!

Best
Quanwei

2017-09-11 13:14 GMT-04:00 Carson Holt <[hidden email]>:
It could be either. Please use MPI instead of starting multiple instances. It will greatly reduce both IO and RAM usage.

—Carson



On Sep 11, 2017, at 11:12 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I only run 5 Maker instances in each directory (and set cpus=2). If it is related to memory issue or an IO issue, I am not sure why the much longer scaffolds (than the failed ones) were all annotated successfully, but the relatively shorter ones failed. 

I have set "tries=5" (#number of times to try a contig if there is a failure for some reason). I will try "clean_try=1" and test on the failed scaffolds individually with larger memory to see whether they can be annotated.

Thank you!

Best
Quanwei

2017-09-11 13:07 GMT-04:00 Carson Holt <[hidden email]>:
I think the cause of the error may have been a little further upstream from what you pasted in the e-mail. One thing that may be happening is that you are taxing resources (like IO) if running MAKER multiple times or on too many CPUs. That can lead to failures because of truncated BLAST reports etc. In which case you can just retry and that will get around those types of IO derived errors. MAKER can generate a lot of IO, and if you are working on network mounted locations (i.e. the storage being used is actually across the network), then they can be lest robust than local storage (when under heavy load NFS can falsely report success on read/write operations that actually failed). It’s the reason we built in the retry capabilities of MAKER.

For contigs that continuously fail, you may need to set clean_try=1. That will cause failures to start from scratch (i.e. delete all old reports on failure rather than just those suspected of being truncated).

—Carson



On Sep 11, 2017, at 10:19 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

About the error in my above email, I found the contig was correctly annotated at the second time RETRY. So please ignore my last email. But now, for a few number of scaffolds, I met problems to process the repeats (as shown below in red). I used both Mammalia repeat library and species specific repeat library (which is generated by your pipeline "http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic"). There were no such problems when I only used Mammalia repeat library. Do you have any ideas about this? What could be the reason? Or do you have any suggestions for me to find the reason? Many thanks 

Here are some parameters I used

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Mammalia #select a model organism for RepBase masking in RepeatMasker
rmlib=../consensi.fa.classifiednoProtFinal #provide an organism specific repeat library in fasta format for Repe

max_dna_len=300000
split_hit=40000
depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking


Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
33708 --> rank=NA, hostname=n409
33709 ERROR: Failed while processing all repeats
33710 ERROR: Chunk failed at level:3, tier_type:1
33711 FAILED CONTIG:Contig31



Best
Quanwei

2017-09-08 23:25 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set  max_dna_len=1Mb, I can train the model successfully.  And in the current training (where I get the following error),  I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the depth_blast as 30 in current training.

Thank you! Have a nice weekend! 



#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Contig10
Length: 18773588
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n224
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig10

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Contig10

Best
Quanwei

2017-09-06 12:06 GMT-04:00 Carson Holt <[hidden email]>:

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson










_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Quanwei Zhang
Dear Carson:

I see. Thank you. I will try it.

Best
Quanwei

2017-09-11 13:46 GMT-04:00 Carson Holt <[hidden email]>:
Each node is a single machine. Because you currently run without MPI, each MAKER job you submit runs on a single machine. So you are either running multiple times on the same node, or you submitted 5 separate batch jobs in which case you may have a single maker process on each of 5 nodes.

MPI can parallelize on the same node or across nodes. If you request 10 nodes, then it can communicate across nodes to run the job on all hardware. Or you can run MPI on a single node and ask for all CPUs on that node. In that case it will split up work within a single node and use all resources just on that node. So if you can’t get MPI to work across nodes, you can just submit a job that goes to a single node and ask for all CPUs on that node (multinode jobs may be hard to configure, but single node jobs are very easy). Just set the -n parameter of mpiexec to the CPU count of that node, and it will parallelize within the node.

Example command for a 20 CPU node —>  mpiexec -n 20 maker

—Carson





On Sep 11, 2017, at 11:27 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

Would you please explain what do you mean by "a single machine"? I am running maker2 on our high performance cluster. The cluster has more than 1,620-core compute nodes with 128 GB RAM each. Univa Grid Engine was used as the scheduler. Can I use MPICH3?

Thanks

Best
Quanwei

2017-09-11 13:18 GMT-04:00 Carson Holt <[hidden email]>:
If you are just using a single machine (and not cross machine MPI), use MPICH3 —> https://www.mpich.org

It’s easy to install yourself, and tends to be very robust to failure.

—Carson



On Sep 11, 2017, at 11:16 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I met some problems to use MPI. I will give it another try.
Thank you!

Best
Quanwei

2017-09-11 13:14 GMT-04:00 Carson Holt <[hidden email]>:
It could be either. Please use MPI instead of starting multiple instances. It will greatly reduce both IO and RAM usage.

—Carson



On Sep 11, 2017, at 11:12 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I only run 5 Maker instances in each directory (and set cpus=2). If it is related to memory issue or an IO issue, I am not sure why the much longer scaffolds (than the failed ones) were all annotated successfully, but the relatively shorter ones failed. 

I have set "tries=5" (#number of times to try a contig if there is a failure for some reason). I will try "clean_try=1" and test on the failed scaffolds individually with larger memory to see whether they can be annotated.

Thank you!

Best
Quanwei

2017-09-11 13:07 GMT-04:00 Carson Holt <[hidden email]>:
I think the cause of the error may have been a little further upstream from what you pasted in the e-mail. One thing that may be happening is that you are taxing resources (like IO) if running MAKER multiple times or on too many CPUs. That can lead to failures because of truncated BLAST reports etc. In which case you can just retry and that will get around those types of IO derived errors. MAKER can generate a lot of IO, and if you are working on network mounted locations (i.e. the storage being used is actually across the network), then they can be lest robust than local storage (when under heavy load NFS can falsely report success on read/write operations that actually failed). It’s the reason we built in the retry capabilities of MAKER.

For contigs that continuously fail, you may need to set clean_try=1. That will cause failures to start from scratch (i.e. delete all old reports on failure rather than just those suspected of being truncated).

—Carson



On Sep 11, 2017, at 10:19 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

About the error in my above email, I found the contig was correctly annotated at the second time RETRY. So please ignore my last email. But now, for a few number of scaffolds, I met problems to process the repeats (as shown below in red). I used both Mammalia repeat library and species specific repeat library (which is generated by your pipeline "http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic"). There were no such problems when I only used Mammalia repeat library. Do you have any ideas about this? What could be the reason? Or do you have any suggestions for me to find the reason? Many thanks 

Here are some parameters I used

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Mammalia #select a model organism for RepBase masking in RepeatMasker
rmlib=../consensi.fa.classifiednoProtFinal #provide an organism specific repeat library in fasta format for Repe

max_dna_len=300000
split_hit=40000
depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking


Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
33708 --> rank=NA, hostname=n409
33709 ERROR: Failed while processing all repeats
33710 ERROR: Chunk failed at level:3, tier_type:1
33711 FAILED CONTIG:Contig31



Best
Quanwei

2017-09-08 23:25 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set  max_dna_len=1Mb, I can train the model successfully.  And in the current training (where I get the following error),  I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the depth_blast as 30 in current training.

Thank you! Have a nice weekend! 



#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Contig10
Length: 18773588
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n224
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig10

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Contig10

Best
Quanwei

2017-09-06 12:06 GMT-04:00 Carson Holt <[hidden email]>:

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson











_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Quanwei Zhang
Dear Carson:

I did more tests on one of the contigs (with length 863kb) that failed when doing repeat masking. I found it only fail when I added the species specific repeat library, and it can be successfully annotated when only considering mammalian repeat library. When I did the test I only picked the this contig and run maker with 64G memory. So I think the failure should not be the problem with memory or IO, because even the contigs with length 98Mb can be annotated with memory 32G.

I also run RepeatMasker on this contig with mammalian and species specific repeat library, separately. I found when I use  mammalian repeat library, about 35% was masked as repeats, while it is 65% when I use species specific repeat library (as shown below in blue). I wonder whether the high level of repeats can lead to the failure of this contig.  Do you have any ideas about this. Thanks



file name: test_scaffold31.fasta   
sequences:             1
total length:     863590 bp  (858757 bp excl N/X-runs)
GC level:         37.02 %
bases masked:     562909 bp ( 65.18 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
SINEs:              113        16134 bp    1.87 %
      ALUs           71        12479 bp    1.45 %
      MIRs            1          133 bp    0.02 %

LINEs:              251       380142 bp   44.02 %
      LINE1         211       210623 bp   24.39 %
      LINE2           1           86 bp    0.01 %
      L3/CR1          0            0 bp    0.00 %

LTR elements:       246       101221 bp   11.72 %
      ERVL            5         1037 bp    0.12 %
      ERVL-MaLRs     18         2744 bp    0.32 %
      ERV_classI    201        90942 bp   10.53 %
      ERV_classII    18         5964 bp    0.69 %

DNA elements:        39        14177 bp    1.64 %
     hAT-Charlie      7         3864 bp    0.45 %
     TcMar-Tigger     7         1706 bp    0.20 %

Unclassified:       196        45831 bp    5.31 %

Total interspersed repeats:   557505 bp   64.56 %


Small RNA:            3          823 bp    0.10 %

Satellites:           2          237 bp    0.03 %
Simple repeats:      94         4472 bp    0.52 %
Low complexity:      18          766 bp    0.09 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element
                                                     

The query species was assumed to be homo         
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
       
run with rmblastn version 2.2.27+
The query was compared to classified sequences in ".../consensi.fa.classifiednoProtFinal" 


Best
Quanwei

2017-09-11 14:33 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I see. Thank you. I will try it.

Best
Quanwei

2017-09-11 13:46 GMT-04:00 Carson Holt <[hidden email]>:
Each node is a single machine. Because you currently run without MPI, each MAKER job you submit runs on a single machine. So you are either running multiple times on the same node, or you submitted 5 separate batch jobs in which case you may have a single maker process on each of 5 nodes.

MPI can parallelize on the same node or across nodes. If you request 10 nodes, then it can communicate across nodes to run the job on all hardware. Or you can run MPI on a single node and ask for all CPUs on that node. In that case it will split up work within a single node and use all resources just on that node. So if you can’t get MPI to work across nodes, you can just submit a job that goes to a single node and ask for all CPUs on that node (multinode jobs may be hard to configure, but single node jobs are very easy). Just set the -n parameter of mpiexec to the CPU count of that node, and it will parallelize within the node.

Example command for a 20 CPU node —>  mpiexec -n 20 maker

—Carson





On Sep 11, 2017, at 11:27 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

Would you please explain what do you mean by "a single machine"? I am running maker2 on our high performance cluster. The cluster has more than 1,620-core compute nodes with 128 GB RAM each. Univa Grid Engine was used as the scheduler. Can I use MPICH3?

Thanks

Best
Quanwei

2017-09-11 13:18 GMT-04:00 Carson Holt <[hidden email]>:
If you are just using a single machine (and not cross machine MPI), use MPICH3 —> https://www.mpich.org

It’s easy to install yourself, and tends to be very robust to failure.

—Carson



On Sep 11, 2017, at 11:16 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I met some problems to use MPI. I will give it another try.
Thank you!

Best
Quanwei

2017-09-11 13:14 GMT-04:00 Carson Holt <[hidden email]>:
It could be either. Please use MPI instead of starting multiple instances. It will greatly reduce both IO and RAM usage.

—Carson



On Sep 11, 2017, at 11:12 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I only run 5 Maker instances in each directory (and set cpus=2). If it is related to memory issue or an IO issue, I am not sure why the much longer scaffolds (than the failed ones) were all annotated successfully, but the relatively shorter ones failed. 

I have set "tries=5" (#number of times to try a contig if there is a failure for some reason). I will try "clean_try=1" and test on the failed scaffolds individually with larger memory to see whether they can be annotated.

Thank you!

Best
Quanwei

2017-09-11 13:07 GMT-04:00 Carson Holt <[hidden email]>:
I think the cause of the error may have been a little further upstream from what you pasted in the e-mail. One thing that may be happening is that you are taxing resources (like IO) if running MAKER multiple times or on too many CPUs. That can lead to failures because of truncated BLAST reports etc. In which case you can just retry and that will get around those types of IO derived errors. MAKER can generate a lot of IO, and if you are working on network mounted locations (i.e. the storage being used is actually across the network), then they can be lest robust than local storage (when under heavy load NFS can falsely report success on read/write operations that actually failed). It’s the reason we built in the retry capabilities of MAKER.

For contigs that continuously fail, you may need to set clean_try=1. That will cause failures to start from scratch (i.e. delete all old reports on failure rather than just those suspected of being truncated).

—Carson



On Sep 11, 2017, at 10:19 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

About the error in my above email, I found the contig was correctly annotated at the second time RETRY. So please ignore my last email. But now, for a few number of scaffolds, I met problems to process the repeats (as shown below in red). I used both Mammalia repeat library and species specific repeat library (which is generated by your pipeline "http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic"). There were no such problems when I only used Mammalia repeat library. Do you have any ideas about this? What could be the reason? Or do you have any suggestions for me to find the reason? Many thanks 

Here are some parameters I used

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Mammalia #select a model organism for RepBase masking in RepeatMasker
rmlib=../consensi.fa.classifiednoProtFinal #provide an organism specific repeat library in fasta format for Repe

max_dna_len=300000
split_hit=40000
depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking


Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
33708 --> rank=NA, hostname=n409
33709 ERROR: Failed while processing all repeats
33710 ERROR: Chunk failed at level:3, tier_type:1
33711 FAILED CONTIG:Contig31



Best
Quanwei

2017-09-08 23:25 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set  max_dna_len=1Mb, I can train the model successfully.  And in the current training (where I get the following error),  I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the depth_blast as 30 in current training.

Thank you! Have a nice weekend! 



#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Contig10
Length: 18773588
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n224
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig10

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Contig10

Best
Quanwei

2017-09-06 12:06 GMT-04:00 Carson Holt <[hidden email]>:

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson












_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Carson Holt-2
These are the 3 errors you have shown in your e-mails —>
open3: fork failed: Cannot allocate memory at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Widget/blastx.pm line 40.
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.

The first two are memory related with the second being because it cannot kill a lock maintainer thread that it was not able to start because of lack of memory.

The third one is IO related. It is a truncated file that succeeded on the second try according to the e-mail you sent.


IO errors are quite common with NFS (network mounted file systems). It’s one of the most frequent issues submitted to the devel list. MAKER can hit IO limits long before it hits CPU limits. One of the most frequent casues of these issues is that the user set TMP= in the control files to a manual location that is not suitable for high IO (note TMP= defaults to /tmp). The location should always be a true locally mounted disk. Sometimes this is a virtual location (not really local disk but network mounted disk or an in memory location). With the former you will get frequent IO failures and with the latter you will also get out of memory issues.

Note that when you supply more data files you will also use more memory (to hold analysis results). According to your e-mail the last error you got was 'Can't kill a non-numeric process ID’. Correct? So getting the error with two input files but not when you supply a single input file further suggests you are running low on RAM.

1. Some things to check. Make sure TMP= is not being set to a network mounted location.
2. Make sure your temporary directory is not a virtual in memory directory on the node being used.
3. If nodes are shared, you may run out of memory because of other users or because you failed to request enough RAM during job submission.

Finally, try running interactively so you can see what the memory and directory locations look like on the node you get assigned for the job (check space and mount points. Is /tmp or whereever you set TMP= in fact a local disk?). Also run with MPI rather than starting multiple MAKER instances. It uses resources better.

Thanks,
Carson






On Sep 13, 2017, at 8:32 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I did more tests on one of the contigs (with length 863kb) that failed when doing repeat masking. I found it only fail when I added the species specific repeat library, and it can be successfully annotated when only considering mammalian repeat library. When I did the test I only picked the this contig and run maker with 64G memory. So I think the failure should not be the problem with memory or IO, because even the contigs with length 98Mb can be annotated with memory 32G.

I also run RepeatMasker on this contig with mammalian and species specific repeat library, separately. I found when I use  mammalian repeat library, about 35% was masked as repeats, while it is 65% when I use species specific repeat library (as shown below in blue). I wonder whether the high level of repeats can lead to the failure of this contig.  Do you have any ideas about this. Thanks



file name: test_scaffold31.fasta   
sequences:             1
total length:     863590 bp  (858757 bp excl N/X-runs)
GC level:         37.02 %
bases masked:     562909 bp ( 65.18 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
SINEs:              113        16134 bp    1.87 %
      ALUs           71        12479 bp    1.45 %
      MIRs            1          133 bp    0.02 %

LINEs:              251       380142 bp   44.02 %
      LINE1         211       210623 bp   24.39 %
      LINE2           1           86 bp    0.01 %
      L3/CR1          0            0 bp    0.00 %

LTR elements:       246       101221 bp   11.72 %
      ERVL            5         1037 bp    0.12 %
      ERVL-MaLRs     18         2744 bp    0.32 %
      ERV_classI    201        90942 bp   10.53 %
      ERV_classII    18         5964 bp    0.69 %

DNA elements:        39        14177 bp    1.64 %
     hAT-Charlie      7         3864 bp    0.45 %
     TcMar-Tigger     7         1706 bp    0.20 %

Unclassified:       196        45831 bp    5.31 %

Total interspersed repeats:   557505 bp   64.56 %


Small RNA:            3          823 bp    0.10 %

Satellites:           2          237 bp    0.03 %
Simple repeats:      94         4472 bp    0.52 %
Low complexity:      18          766 bp    0.09 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element
                                                     

The query species was assumed to be homo         
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
       
run with rmblastn version 2.2.27+
The query was compared to classified sequences in ".../consensi.fa.classifiednoProtFinal" 


Best
Quanwei

2017-09-11 14:33 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I see. Thank you. I will try it.

Best
Quanwei

2017-09-11 13:46 GMT-04:00 Carson Holt <[hidden email]>:
Each node is a single machine. Because you currently run without MPI, each MAKER job you submit runs on a single machine. So you are either running multiple times on the same node, or you submitted 5 separate batch jobs in which case you may have a single maker process on each of 5 nodes.

MPI can parallelize on the same node or across nodes. If you request 10 nodes, then it can communicate across nodes to run the job on all hardware. Or you can run MPI on a single node and ask for all CPUs on that node. In that case it will split up work within a single node and use all resources just on that node. So if you can’t get MPI to work across nodes, you can just submit a job that goes to a single node and ask for all CPUs on that node (multinode jobs may be hard to configure, but single node jobs are very easy). Just set the -n parameter of mpiexec to the CPU count of that node, and it will parallelize within the node.

Example command for a 20 CPU node —>  mpiexec -n 20 maker

—Carson





On Sep 11, 2017, at 11:27 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

Would you please explain what do you mean by "a single machine"? I am running maker2 on our high performance cluster. The cluster has more than 1,620-core compute nodes with 128 GB RAM each. Univa Grid Engine was used as the scheduler. Can I use MPICH3?

Thanks

Best
Quanwei

2017-09-11 13:18 GMT-04:00 Carson Holt <[hidden email]>:
If you are just using a single machine (and not cross machine MPI), use MPICH3 —> https://www.mpich.org

It’s easy to install yourself, and tends to be very robust to failure.

—Carson



On Sep 11, 2017, at 11:16 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I met some problems to use MPI. I will give it another try.
Thank you!

Best
Quanwei

2017-09-11 13:14 GMT-04:00 Carson Holt <[hidden email]>:
It could be either. Please use MPI instead of starting multiple instances. It will greatly reduce both IO and RAM usage.

—Carson



On Sep 11, 2017, at 11:12 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I only run 5 Maker instances in each directory (and set cpus=2). If it is related to memory issue or an IO issue, I am not sure why the much longer scaffolds (than the failed ones) were all annotated successfully, but the relatively shorter ones failed. 

I have set "tries=5" (#number of times to try a contig if there is a failure for some reason). I will try "clean_try=1" and test on the failed scaffolds individually with larger memory to see whether they can be annotated.

Thank you!

Best
Quanwei

2017-09-11 13:07 GMT-04:00 Carson Holt <[hidden email]>:
I think the cause of the error may have been a little further upstream from what you pasted in the e-mail. One thing that may be happening is that you are taxing resources (like IO) if running MAKER multiple times or on too many CPUs. That can lead to failures because of truncated BLAST reports etc. In which case you can just retry and that will get around those types of IO derived errors. MAKER can generate a lot of IO, and if you are working on network mounted locations (i.e. the storage being used is actually across the network), then they can be lest robust than local storage (when under heavy load NFS can falsely report success on read/write operations that actually failed). It’s the reason we built in the retry capabilities of MAKER.

For contigs that continuously fail, you may need to set clean_try=1. That will cause failures to start from scratch (i.e. delete all old reports on failure rather than just those suspected of being truncated).

—Carson



On Sep 11, 2017, at 10:19 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

About the error in my above email, I found the contig was correctly annotated at the second time RETRY. So please ignore my last email. But now, for a few number of scaffolds, I met problems to process the repeats (as shown below in red). I used both Mammalia repeat library and species specific repeat library (which is generated by your pipeline "http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic"). There were no such problems when I only used Mammalia repeat library. Do you have any ideas about this? What could be the reason? Or do you have any suggestions for me to find the reason? Many thanks 

Here are some parameters I used

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Mammalia #select a model organism for RepBase masking in RepeatMasker
rmlib=../consensi.fa.classifiednoProtFinal #provide an organism specific repeat library in fasta format for Repe

max_dna_len=300000
split_hit=40000
depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking


Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
33708 --> rank=NA, hostname=n409
33709 ERROR: Failed while processing all repeats
33710 ERROR: Chunk failed at level:3, tier_type:1
33711 FAILED CONTIG:Contig31



Best
Quanwei

2017-09-08 23:25 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set  max_dna_len=1Mb, I can train the model successfully.  And in the current training (where I get the following error),  I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the depth_blast as 30 in current training.

Thank you! Have a nice weekend! 



#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Contig10
Length: 18773588
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n224
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig10

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Contig10

Best
Quanwei

2017-09-06 12:06 GMT-04:00 Carson Holt <[hidden email]>:

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson













_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Some errors reported by Maker2

Quanwei Zhang
Dear Carson:

Thank you for your explanation.  Sorry for not describing my problem clearly. The first two errors were all gone after I changed the parameters you suggested (e.g., max_dna_len, depeth_blast). Now I only get the following error for two contigs among thousands of contigs. One of the two failed contigs has length 863k, and I have done more tests on this contig individually. By running repeatmask on this contig, 65% was masked when using species specific repeat library, while it is only 35% when using mammalian repeat library. Since longer contigs (even 98Mb) can all be annotated, I doubt why this much shorter one can fail due to IO.

I did not set "TMP", and I am running on a high performance cluster. I am not sure whether it is a virtual memory or not. I will check it later. Many thanks

Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
33708 --> rank=NA, hostname=n409
33709 ERROR: Failed while processing all repeats
33710 ERROR: Chunk failed at level:3, tier_type:1
33711 FAILED CONTIG:Contig31

Best
Quanwei

2017-09-13 14:23 GMT-04:00 Carson Holt <[hidden email]>:
These are the 3 errors you have shown in your e-mails —>
open3: fork failed: Cannot allocate memory at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Widget/blastx.pm line 40.
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.

The first two are memory related with the second being because it cannot kill a lock maintainer thread that it was not able to start because of lack of memory.

The third one is IO related. It is a truncated file that succeeded on the second try according to the e-mail you sent.


IO errors are quite common with NFS (network mounted file systems). It’s one of the most frequent issues submitted to the devel list. MAKER can hit IO limits long before it hits CPU limits. One of the most frequent casues of these issues is that the user set TMP= in the control files to a manual location that is not suitable for high IO (note TMP= defaults to /tmp). The location should always be a true locally mounted disk. Sometimes this is a virtual location (not really local disk but network mounted disk or an in memory location). With the former you will get frequent IO failures and with the latter you will also get out of memory issues.

Note that when you supply more data files you will also use more memory (to hold analysis results). According to your e-mail the last error you got was 'Can't kill a non-numeric process ID’. Correct? So getting the error with two input files but not when you supply a single input file further suggests you are running low on RAM.

1. Some things to check. Make sure TMP= is not being set to a network mounted location.
2. Make sure your temporary directory is not a virtual in memory directory on the node being used.
3. If nodes are shared, you may run out of memory because of other users or because you failed to request enough RAM during job submission.

Finally, try running interactively so you can see what the memory and directory locations look like on the node you get assigned for the job (check space and mount points. Is /tmp or whereever you set TMP= in fact a local disk?). Also run with MPI rather than starting multiple MAKER instances. It uses resources better.

Thanks,
Carson






On Sep 13, 2017, at 8:32 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I did more tests on one of the contigs (with length 863kb) that failed when doing repeat masking. I found it only fail when I added the species specific repeat library, and it can be successfully annotated when only considering mammalian repeat library. When I did the test I only picked the this contig and run maker with 64G memory. So I think the failure should not be the problem with memory or IO, because even the contigs with length 98Mb can be annotated with memory 32G.

I also run RepeatMasker on this contig with mammalian and species specific repeat library, separately. I found when I use  mammalian repeat library, about 35% was masked as repeats, while it is 65% when I use species specific repeat library (as shown below in blue). I wonder whether the high level of repeats can lead to the failure of this contig.  Do you have any ideas about this. Thanks



file name: test_scaffold31.fasta   
sequences:             1
total length:     863590 bp  (858757 bp excl N/X-runs)
GC level:         37.02 %
bases masked:     562909 bp ( 65.18 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
SINEs:              113        16134 bp    1.87 %
      ALUs           71        12479 bp    1.45 %
      MIRs            1          133 bp    0.02 %

LINEs:              251       380142 bp   44.02 %
      LINE1         211       210623 bp   24.39 %
      LINE2           1           86 bp    0.01 %
      L3/CR1          0            0 bp    0.00 %

LTR elements:       246       101221 bp   11.72 %
      ERVL            5         1037 bp    0.12 %
      ERVL-MaLRs     18         2744 bp    0.32 %
      ERV_classI    201        90942 bp   10.53 %
      ERV_classII    18         5964 bp    0.69 %

DNA elements:        39        14177 bp    1.64 %
     hAT-Charlie      7         3864 bp    0.45 %
     TcMar-Tigger     7         1706 bp    0.20 %

Unclassified:       196        45831 bp    5.31 %

Total interspersed repeats:   557505 bp   64.56 %


Small RNA:            3          823 bp    0.10 %

Satellites:           2          237 bp    0.03 %
Simple repeats:      94         4472 bp    0.52 %
Low complexity:      18          766 bp    0.09 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element
                                                     

The query species was assumed to be homo         
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
       
run with rmblastn version 2.2.27+
The query was compared to classified sequences in ".../consensi.fa.classifiednoProtFinal" 


Best
Quanwei

2017-09-11 14:33 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I see. Thank you. I will try it.

Best
Quanwei

2017-09-11 13:46 GMT-04:00 Carson Holt <[hidden email]>:
Each node is a single machine. Because you currently run without MPI, each MAKER job you submit runs on a single machine. So you are either running multiple times on the same node, or you submitted 5 separate batch jobs in which case you may have a single maker process on each of 5 nodes.

MPI can parallelize on the same node or across nodes. If you request 10 nodes, then it can communicate across nodes to run the job on all hardware. Or you can run MPI on a single node and ask for all CPUs on that node. In that case it will split up work within a single node and use all resources just on that node. So if you can’t get MPI to work across nodes, you can just submit a job that goes to a single node and ask for all CPUs on that node (multinode jobs may be hard to configure, but single node jobs are very easy). Just set the -n parameter of mpiexec to the CPU count of that node, and it will parallelize within the node.

Example command for a 20 CPU node —>  mpiexec -n 20 maker

—Carson





On Sep 11, 2017, at 11:27 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

Would you please explain what do you mean by "a single machine"? I am running maker2 on our high performance cluster. The cluster has more than 1,620-core compute nodes with 128 GB RAM each. Univa Grid Engine was used as the scheduler. Can I use MPICH3?

Thanks

Best
Quanwei

2017-09-11 13:18 GMT-04:00 Carson Holt <[hidden email]>:
If you are just using a single machine (and not cross machine MPI), use MPICH3 —> https://www.mpich.org

It’s easy to install yourself, and tends to be very robust to failure.

—Carson



On Sep 11, 2017, at 11:16 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I met some problems to use MPI. I will give it another try.
Thank you!

Best
Quanwei

2017-09-11 13:14 GMT-04:00 Carson Holt <[hidden email]>:
It could be either. Please use MPI instead of starting multiple instances. It will greatly reduce both IO and RAM usage.

—Carson



On Sep 11, 2017, at 11:12 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

I only run 5 Maker instances in each directory (and set cpus=2). If it is related to memory issue or an IO issue, I am not sure why the much longer scaffolds (than the failed ones) were all annotated successfully, but the relatively shorter ones failed. 

I have set "tries=5" (#number of times to try a contig if there is a failure for some reason). I will try "clean_try=1" and test on the failed scaffolds individually with larger memory to see whether they can be annotated.

Thank you!

Best
Quanwei

2017-09-11 13:07 GMT-04:00 Carson Holt <[hidden email]>:
I think the cause of the error may have been a little further upstream from what you pasted in the e-mail. One thing that may be happening is that you are taxing resources (like IO) if running MAKER multiple times or on too many CPUs. That can lead to failures because of truncated BLAST reports etc. In which case you can just retry and that will get around those types of IO derived errors. MAKER can generate a lot of IO, and if you are working on network mounted locations (i.e. the storage being used is actually across the network), then they can be lest robust than local storage (when under heavy load NFS can falsely report success on read/write operations that actually failed). It’s the reason we built in the retry capabilities of MAKER.

For contigs that continuously fail, you may need to set clean_try=1. That will cause failures to start from scratch (i.e. delete all old reports on failure rather than just those suspected of being truncated).

—Carson



On Sep 11, 2017, at 10:19 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

About the error in my above email, I found the contig was correctly annotated at the second time RETRY. So please ignore my last email. But now, for a few number of scaffolds, I met problems to process the repeats (as shown below in red). I used both Mammalia repeat library and species specific repeat library (which is generated by your pipeline "http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic"). There were no such problems when I only used Mammalia repeat library. Do you have any ideas about this? What could be the reason? Or do you have any suggestions for me to find the reason? Many thanks 

Here are some parameters I used

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Mammalia #select a model organism for RepBase masking in RepeatMasker
rmlib=../consensi.fa.classifiednoProtFinal #provide an organism specific repeat library in fasta format for Repe

max_dna_len=300000
split_hit=40000
depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking


Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.
33708 --> rank=NA, hostname=n409
33709 ERROR: Failed while processing all repeats
33710 ERROR: Chunk failed at level:3, tier_type:1
33711 FAILED CONTIG:Contig31



Best
Quanwei

2017-09-08 23:25 GMT-04:00 Quanwei Zhang <[hidden email]>:
Dear Carson:

I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set  max_dna_len=1Mb, I can train the model successfully.  And in the current training (where I get the following error),  I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the depth_blast as 30 in current training.

Thank you! Have a nice weekend! 



#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Contig10
Length: 18773588
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
doing blastx repeats
collecting blastx repeatmasking
processing all repeats
doing repeat masking
Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm line 1050.
--> rank=NA, hostname=n224
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:Contig10

ERROR: Chunk failed at level:2, tier_type:0
FAILED CONTIG:Contig10

Best
Quanwei

2017-09-06 12:06 GMT-04:00 Carson Holt <[hidden email]>:

(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.


(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?).  Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.

BLASTN (ESTs) -> fastest as it is searching nucleotide space
BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN
TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX

Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.


(4) For some reasons, I can not run maker though MPI on our cluster. So I can only start multiple maker. I wonder if it is possible to let multiple maker to annotate the same long scaffold (i.e., for a single sequence I start multiple maker, without splitting the long sequence into shorter ones).

Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.


(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right?  

The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.


—Carson














_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
12