Does maker support muti-processing for a single long fasta sequence using openMPI?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Does maker support muti-processing for a single long fasta sequence using openMPI?

Xu, taosheng
Dear Maker Development Team,
I wonder whether maker supports parallel processing for a single long genome sequence?

When I submit my maker task using openMPI with multiple cpus (like, mpiexec -n 40  maker) to annotate a single long genome sequence, always only one maker with 4 rmblast run. The other cpus is on idle.

I want to use the maker parallel processing with openMPI to speed up a single ultra-long genome sequence annotation.

Best regards,
Taosheng

_______________________________________________
maker-devel mailing list
[hidden email]
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Does maker support muti-processing for a single long fasta sequence using openMPI?

Carson Holt-2
Yes. It will divide contigs into chunks the same size as the max_dna_len parameter.

—Carson

Sent from my iPhone

> On Oct 17, 2020, at 12:48 AM, Xu, taosheng <[hidden email]> wrote:
>
> 
> Dear Maker Development Team,
> I wonder whether maker supports parallel processing for a single long genome sequence?
>
> When I submit my maker task using openMPI with multiple cpus (like, mpiexec -n 40  maker) to annotate a single long genome sequence, always only one maker with 4 rmblast run. The other cpus is on idle.
>
> I want to use the maker parallel processing with openMPI to speed up a single ultra-long genome sequence annotation.
>
> Best regards,
> Taosheng
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Does maker support muti-processing for a single long fasta sequence using openMPI?

Xu, taosheng
Thank you very much Carson for your timely response,
Yes I think so. The Maker MPI should support the multi-processing of an ultra-long single sequence. But I cannot run it successfully for a single sequence.
First I  make sure the openMPI with maker has been installed properly. It works well for multiple DNA sequences in a parallel way.
When I submit a maker job for an ultra-long single sequence (mpiexec -mca btl ^openib -n 40 maker -g scaffold1.fasta -fix_nucleotides, The max_dna_len is set to 100000). It always left only one maker thread run. The other maker threads are disappeared and show finished in the output information. See the output information below. Please help me to check it. Thanks for your kind help and your time.

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)

Best regards,
Taosheng



OUTPUT Information
STATUS: Processing and indexing input FASTA files...
STATUS: Setting up database for any GFF3 input...
examining contents of the fasta file and run log
A data structure will be created for you at:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore

To access files for individual sequences use the datastore index:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_master_datastore_index.log

STATUS: Now running MAKER...



--Next Contig--

Processing run.log file...
examining contents of the fasta file and run log



--Next Contig--

#---------------------------------------------------------------------
Another instance of maker is processing this contig!!
SeqID: scaffold#1
Length: 73580997
#---------------------------------------------------------------------


examining contents of the fasta file and run log



--Next Contig--

#---------------------------------------------------------------------
Another instance of maker is processing this contig!!
SeqID: scaffold#1
Length: 73580997
#---------------------------------------------------------------------




Maker is now finished!!!



Start_time: 1603000030
End_time:   1603000033
Elapsed:    3
A data structure will be created for you at:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore

To access files for individual sequences use the datastore index:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_master_datastore_index.log

STATUS: Now running MAKER...
MAKER WARNING: The file plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/28/scaffold#1.289.plant_repeatFinal%2Elib%2Empi%2E10%2E2.specific.out
did not finish on the last run and must be erased
WARNING: Multiple MAKER processes have been started in the
same directory.

STATUS: Processing and indexing input FASTA files...
examining contents of the fasta file and run log



--Next Contig--

#---------------------------------------------------------------------
Another instance of maker is processing this contig!!
SeqID: scaffold#1
Length: 73580997
#---------------------------------------------------------------------


WARNING: Multiple MAKER processes have been started in the
same directory.

STATUS: Processing and indexing input FASTA files...
STATUS: Setting up database for any GFF3 input...
STATUS: Setting up database for any GFF3 input...
A data structure will be created for you at:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore

To access files for individual sequences use the datastore index:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_master_datastore_index.log

STATUS: Now running MAKER...
examining contents of the fasta file and run log



--Next Contig--

#---------------------------------------------------------------------
Another instance of maker is processing this contig!!
SeqID: scaffold#1
Length: 73580997
#---------------------------------------------------------------------




Maker is now finished!!!



Start_time: 1603000030
End_time:   1603000034
Elapsed:    4
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: scaffold#1
Length: 73580997
#---------------------------------------------------------------------


examining contents of the fasta file and run log



--Next Contig--

#---------------------------------------------------------------------
Another instance of maker is processing this contig!!
SeqID: scaffold#1
Length: 73580997
#---------------------------------------------------------------------


examining contents of the fasta file and run log



--Next Contig--

#---------------------------------------------------------------------
Another instance of maker is processing this contig!!
SeqID: scaffold#1
Length: 73580997
#---------------------------------------------------------------------




Start_time: 1603000030
End_time:   1603000034
Elapsed:    4


Maker is now finished!!!

A data structure will be created for you at:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore

To access files for individual sequences use the datastore index:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_master_datastore_index.log

STATUS: Now running MAKER...
examining contents of the fasta file and run log
..........

Maker is now finished!!!



Start_time: 1602920407
End_time:   1602920580
Elapsed:    173
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_VflRYH; ./maker3.01/exe/RepeatMasker/RepeatMasker /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0/scaffold#1.0.plant_repeatFinal%2Elib%2Empi%2E10%2E0.specific -dir /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0 -pa 1 -lib /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/mpi_blastdb/plant_repeatFinal%2Elib.mpi.10/plant_repeatFinal%2Elib.mpi.10.0
#-------------------------------#
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_VflRYH; ./maker3.01/exe/RepeatMasker/RepeatMasker /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0/scaffold#1.0.plant_repeatFinal%2Elib%2Empi%2E10%2E1.specific -dir /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0 -pa 1 -lib /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/mpi_blastdb/plant_repeatFinal%2Elib.mpi.10/plant_repeatFinal%2Elib.mpi.10.1
#-------------------------------#
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_VflRYH; ./maker3.01/exe/RepeatMasker/RepeatMasker /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0/scaffold#1.0.plant_repeatFinal%2Elib%2Empi%2E10%2E2.specific -dir /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0 -pa 1 -lib /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/mpi_blastdb/plant_repeatFinal%2Elib.mpi.10/plant_repeatFinal%2Elib.mpi.10.2
#-------------------------------#
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_VflRYH; ./maker3.01/exe/RepeatMasker/RepeatMasker /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0/scaffold#1.0.plant_repeatFinal%2Elib%2Empi%2E10%2E3.specific -dir /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0 -pa 1 -lib /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/mpi_blastdb/plant_repeatFinal%2Elib.mpi.10/plant_repeatFinal%2Elib.mpi.10.3
#-------------------------------#
running  repeat masker.
#--------- command -------------#

.....

On Mon, Oct 19, 2020 at 10:32 PM Carson Holt <[hidden email]> wrote:
Yes. It will divide contigs into chunks the same size as the max_dna_len parameter.

—Carson

Sent from my iPhone

> On Oct 17, 2020, at 12:48 AM, Xu, taosheng <[hidden email]> wrote:
>
> 
> Dear Maker Development Team,
> I wonder whether maker supports parallel processing for a single long genome sequence?
>
> When I submit my maker task using openMPI with multiple cpus (like, mpiexec -n 40  maker) to annotate a single long genome sequence, always only one maker with 4 rmblast run. The other cpus is on idle.
>
> I want to use the maker parallel processing with openMPI to speed up a single ultra-long genome sequence annotation.
>
> Best regards,
> Taosheng
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Does maker support muti-processing for a single long fasta sequence using openMPI?

Carson Holt-2
Your MPI processes may not be seeing each other.  So you are getting multiple maker runs all colliding.  You need to reinstall MAKER and say ‘yes’ to the compile for MPI question.  You may also have to reinstall OpenMPI if just reinstalling MAKER does not work.  You can test MAKER for MPI by running the following —> mpiexec -mca btl ^openib -n 40 maker -help

If you get a single help message then everything is fine.  If you get 40 help messages, then MPI is not communicating correctly.

—Carson


On Oct 20, 2020, at 10:23 AM, Xu, taosheng <[hidden email]> wrote:

Thank you very much Carson for your timely response,
Yes I think so. The Maker MPI should support the multi-processing of an ultra-long single sequence. But I cannot run it successfully for a single sequence.
First I  make sure the openMPI with maker has been installed properly. It works well for multiple DNA sequences in a parallel way.
When I submit a maker job for an ultra-long single sequence (mpiexec -mca btl ^openib -n 40 maker -g scaffold1.fasta -fix_nucleotides, The max_dna_len is set to 100000). It always left only one maker thread run. The other maker threads are disappeared and show finished in the output information. See the output information below. Please help me to check it. Thanks for your kind help and your time.

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)

Best regards,
Taosheng



OUTPUT Information
STATUS: Processing and indexing input FASTA files...
STATUS: Setting up database for any GFF3 input...
examining contents of the fasta file and run log
A data structure will be created for you at:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore

To access files for individual sequences use the datastore index:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_master_datastore_index.log

STATUS: Now running MAKER...



--Next Contig--

Processing run.log file...
examining contents of the fasta file and run log



--Next Contig--

#---------------------------------------------------------------------
Another instance of maker is processing this contig!!
SeqID: scaffold#1
Length: 73580997
#---------------------------------------------------------------------


examining contents of the fasta file and run log



--Next Contig--

#---------------------------------------------------------------------
Another instance of maker is processing this contig!!
SeqID: scaffold#1
Length: 73580997
#---------------------------------------------------------------------




Maker is now finished!!!



Start_time: 1603000030
End_time:   1603000033
Elapsed:    3
A data structure will be created for you at:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore

To access files for individual sequences use the datastore index:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_master_datastore_index.log

STATUS: Now running MAKER...
MAKER WARNING: The file plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/28/scaffold#1.289.plant_repeatFinal%2Elib%2Empi%2E10%2E2.specific.out
did not finish on the last run and must be erased
WARNING: Multiple MAKER processes have been started in the
same directory.

STATUS: Processing and indexing input FASTA files...
examining contents of the fasta file and run log



--Next Contig--

#---------------------------------------------------------------------
Another instance of maker is processing this contig!!
SeqID: scaffold#1
Length: 73580997
#---------------------------------------------------------------------


WARNING: Multiple MAKER processes have been started in the
same directory.

STATUS: Processing and indexing input FASTA files...
STATUS: Setting up database for any GFF3 input...
STATUS: Setting up database for any GFF3 input...
A data structure will be created for you at:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore

To access files for individual sequences use the datastore index:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_master_datastore_index.log

STATUS: Now running MAKER...
examining contents of the fasta file and run log



--Next Contig--

#---------------------------------------------------------------------
Another instance of maker is processing this contig!!
SeqID: scaffold#1
Length: 73580997
#---------------------------------------------------------------------




Maker is now finished!!!



Start_time: 1603000030
End_time:   1603000034
Elapsed:    4
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: scaffold#1
Length: 73580997
#---------------------------------------------------------------------


examining contents of the fasta file and run log



--Next Contig--

#---------------------------------------------------------------------
Another instance of maker is processing this contig!!
SeqID: scaffold#1
Length: 73580997
#---------------------------------------------------------------------


examining contents of the fasta file and run log



--Next Contig--

#---------------------------------------------------------------------
Another instance of maker is processing this contig!!
SeqID: scaffold#1
Length: 73580997
#---------------------------------------------------------------------




Start_time: 1603000030
End_time:   1603000034
Elapsed:    4


Maker is now finished!!!

A data structure will be created for you at:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore

To access files for individual sequences use the datastore index:
/data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_master_datastore_index.log

STATUS: Now running MAKER...
examining contents of the fasta file and run log
..........

Maker is now finished!!!



Start_time: 1602920407
End_time:   1602920580
Elapsed:    173
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_VflRYH; ./maker3.01/exe/RepeatMasker/RepeatMasker /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0/scaffold#1.0.plant_repeatFinal%2Elib%2Empi%2E10%2E0.specific -dir /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0 -pa 1 -lib /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/mpi_blastdb/plant_repeatFinal%2Elib.mpi.10/plant_repeatFinal%2Elib.mpi.10.0
#-------------------------------#
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_VflRYH; ./maker3.01/exe/RepeatMasker/RepeatMasker /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0/scaffold#1.0.plant_repeatFinal%2Elib%2Empi%2E10%2E1.specific -dir /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0 -pa 1 -lib /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/mpi_blastdb/plant_repeatFinal%2Elib.mpi.10/plant_repeatFinal%2Elib.mpi.10.1
#-------------------------------#
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_VflRYH; ./maker3.01/exe/RepeatMasker/RepeatMasker /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0/scaffold#1.0.plant_repeatFinal%2Elib%2Empi%2E10%2E2.specific -dir /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0 -pa 1 -lib /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/mpi_blastdb/plant_repeatFinal%2Elib.mpi.10/plant_repeatFinal%2Elib.mpi.10.2
#-------------------------------#
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_VflRYH; ./maker3.01/exe/RepeatMasker/RepeatMasker /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0/scaffold#1.0.plant_repeatFinal%2Elib%2Empi%2E10%2E3.specific -dir /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0 -pa 1 -lib /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/mpi_blastdb/plant_repeatFinal%2Elib.mpi.10/plant_repeatFinal%2Elib.mpi.10.3
#-------------------------------#
running  repeat masker.
#--------- command -------------#

.....

On Mon, Oct 19, 2020 at 10:32 PM Carson Holt <[hidden email]> wrote:
Yes. It will divide contigs into chunks the same size as the max_dna_len parameter.

—Carson

Sent from my iPhone

> On Oct 17, 2020, at 12:48 AM, Xu, taosheng <[hidden email]> wrote:
>
> 
> Dear Maker Development Team,
> I wonder whether maker supports parallel processing for a single long genome sequence?
>
> When I submit my maker task using openMPI with multiple cpus (like, mpiexec -n 40  maker) to annotate a single long genome sequence, always only one maker with 4 rmblast run. The other cpus is on idle.
>
> I want to use the maker parallel processing with openMPI to speed up a single ultra-long genome sequence annotation.
>
> Best regards,
> Taosheng
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

smime.p7s (1K) Download Attachment