Maker MPI across nodes

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Maker MPI across nodes

James Cross (ITCS - Staff)

Hi Maker Developers,

 

We are trying to run Maker with OpenMPI on our HPC cluster across two nodes (each node containing 28 Core’s so 56 Core’s in total). While Maker seems to be running correctly its going slower when split across two nodes (56 Core’s) as opposed to being run on a single node (28 Core’s). We are trying to increase the speed that Maker will complete its run in. Do you know of any reason for why Maker might slow down when split across two nodes?

 

Our cluster OS is: CentOS 6.7 and the HPC scheduler used is: LSF. We are running Open mpi on a Mellanox Infiniband network.

 

The genome data we wish to annotate is comprised of 1948 scaffolds with an average length of 324890bp (longest scaffold 6948830bp). 

 

The command in batch mode we are using is: mpiexec -mca btl ^openib -n 56 maker

 

Any help or advise you could give would be greatly appreciated. 

 

Best Wishes

Jimmy

----------------------------------------------------------------------

Mr  James Cross

HPC Systems Developer

University of East Anglia

Norwich Research Park

ITCS

Norwich, Norfolk

NR4 7TJ

 

Information Services

 


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker MPI across nodes

Carson Holt-2
The "-mca btl ^openib” flag has the side affect of bypassing infiniband and using ethernet. But if alternate communicators are too slow, you can switch back to indirect infiniband by using '--mca btl vader,tcp,self --mca btl_tcp_if_include ib0’. That option will force IP over infiniband whichb instead of direct infiniband. OpenFabrics libraries used by infiniband has a know issue because of how it uses registered memory (it generates seg faults whenever a program does system calls - i.e. MAKER calling BLAST). So you can’t use direct infinband with MAKER. So try this instead —>  '--mca btl vader,tcp,self --mca btl_tcp_if_include ib0’

Also if it stays slow, it likely means you are hitting IO limits. If that is the case, make sure you are note setting TMP= to a network mounted disk location, and that whatever temp space exists on your cluster it needs to be per node real local mounted disk and not network mounted disk.

—Carson




On Sep 20, 2017, at 8:02 AM, James Cross (ITCS - Staff) <[hidden email]> wrote:

Hi Maker Developers,
 
We are trying to run Maker with OpenMPI on our HPC cluster across two nodes (each node containing 28 Core’s so 56 Core’s in total). While Maker seems to be running correctly its going slower when split across two nodes (56 Core’s) as opposed to being run on a single node (28 Core’s). We are trying to increase the speed that Maker will complete its run in. Do you know of any reason for why Maker might slow down when split across two nodes?
 
Our cluster OS is: CentOS 6.7 and the HPC scheduler used is: LSF. We are running Open mpi on a Mellanox Infiniband network.
 
The genome data we wish to annotate is comprised of 1948 scaffolds with an average length of 324890bp (longest scaffold 6948830bp). 
 
The command in batch mode we are using is: mpiexec -mca btl ^openib -n 56 maker
 
Any help or advise you could give would be greatly appreciated. 
 
Best Wishes
Jimmy
----------------------------------------------------------------------
Mr  James Cross
HPC Systems Developer
University of East Anglia
Norwich Research Park
ITCS
Norwich, Norfolk
NR4 7TJ
 
Information Services
 
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org