maker MPI problem

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

maker MPI problem

zl c

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 

--------------------------------------------
Zelin Chen [[hidden email]]  Ph.D.

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

run05.mpi.o47346077 (81K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: maker MPI problem

Carson Holt-2
This is rather vague —> “crashed the computer cluster

Do you have a specific error?

—Carson



On Aug 14, 2017, at 12:59 PM, zl c <[hidden email]> wrote:

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 
--------------------------------------------
Zelin Chen [[hidden email]]  Ph.D.

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004
<run05.mpi.o47346077>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker MPI problem

Fields, Christopher J

Carson,

 

It was attached to the initial message (named ‘run05.mpi.o47346077’).  It looks like a Perl issue with threads, though I don’t see why this would crash a cluster.  The fact there is a log file would suggest it just ended the job.

 

chris

 

From: maker-devel <[hidden email]> on behalf of Carson Holt <[hidden email]>
Date: Monday, August 14, 2017 at 2:18 PM
To: zl c <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] maker MPI problem

 

This is rather vague —> “crashed the computer cluster”

 

Do you have a specific error?

 

—Carson

 

 

 

On Aug 14, 2017, at 12:59 PM, zl c <[hidden email]> wrote:

 

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 

--------------------------------------------

Zelin Chen [[hidden email]]  Ph.D.

 

NIH/NHGRI

Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

<run05.mpi.o47346077>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker MPI problem

zl c
When I installed by './Build install', I got following some messages:

Configuring MAKER with MPI support

Installing MAKER...

Configuring MAKER with MPI support

Subroutine dl_load_flags redefined at (eval 125) line 8.

Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9.


I'm not sure whether it's correctly installed.

Thanks,

--------------------------------------------
Zelin Chen [[hidden email]]

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Mon, Aug 14, 2017 at 9:23 PM, Fields, Christopher J <[hidden email]> wrote:

Carson,

 

It was attached to the initial message (named ‘run05.mpi.o47346077’).  It looks like a Perl issue with threads, though I don’t see why this would crash a cluster.  The fact there is a log file would suggest it just ended the job.

 

chris

 

From: maker-devel <[hidden email]> on behalf of Carson Holt <[hidden email]>
Date: Monday, August 14, 2017 at 2:18 PM
To: zl c <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] maker MPI problem

 

This is rather vague —> “crashed the computer cluster”

 

Do you have a specific error?

 

—Carson

 

 

 

On Aug 14, 2017, at 12:59 PM, zl c <[hidden email]> wrote:

 

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 

--------------------------------------------

Zelin Chen [[hidden email]]  Ph.D.

 

NIH/NHGRI

Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

<run05.mpi.o47346077>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker MPI problem

Carson Holt-2
You may need to delete the .../maker/perl directory before doing the reinstall if not doing a brand new installation. Otherwise you can ignore the subroutine redefined warnings during compile.

Have you been able to test the alternate flags on the command line for MPI? How about an alternate perl without threads?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 8:27 AM, zl c <[hidden email]> wrote:

When I installed by './Build install', I got following some messages:

Configuring MAKER with MPI support

Installing MAKER...

Configuring MAKER with MPI support

Subroutine dl_load_flags redefined at (eval 125) line 8.

Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9.


I'm not sure whether it's correctly installed.

Thanks,

--------------------------------------------
Zelin Chen [[hidden email]]

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Mon, Aug 14, 2017 at 9:23 PM, Fields, Christopher J <[hidden email]> wrote:

Carson,

 

It was attached to the initial message (named ‘run05.mpi.o47346077’).  It looks like a Perl issue with threads, though I don’t see why this would crash a cluster.  The fact there is a log file would suggest it just ended the job.

 

chris

 

From: maker-devel <[hidden email]> on behalf of Carson Holt <[hidden email]>
Date: Monday, August 14, 2017 at 2:18 PM
To: zl c <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] maker MPI problem

 

This is rather vague —> “crashed the computer cluster”

 

Do you have a specific error?

 

—Carson

 

 

 

On Aug 14, 2017, at 12:59 PM, zl c <[hidden email]> wrote:

 

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 

--------------------------------------------

Zelin Chen [[hidden email]]  Ph.D.

 

NIH/NHGRI

Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

<run05.mpi.o47346077>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker MPI problem

zl c
The other option doesn't work. I reinstall it with a perl without multiple threads. Now it's building the blast database.

--------------------------------------------
Zelin Chen [[hidden email]]


On Tue, Aug 15, 2017 at 10:39 AM, Carson Holt <[hidden email]> wrote:
You may need to delete the .../maker/perl directory before doing the reinstall if not doing a brand new installation. Otherwise you can ignore the subroutine redefined warnings during compile.

Have you been able to test the alternate flags on the command line for MPI? How about an alternate perl without threads?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 8:27 AM, zl c <[hidden email]> wrote:

When I installed by './Build install', I got following some messages:

Configuring MAKER with MPI support

Installing MAKER...

Configuring MAKER with MPI support

Subroutine dl_load_flags redefined at (eval 125) line 8.

Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9.


I'm not sure whether it's correctly installed.

Thanks,

--------------------------------------------
Zelin Chen [[hidden email]]

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Mon, Aug 14, 2017 at 9:23 PM, Fields, Christopher J <[hidden email]> wrote:

Carson,

 

It was attached to the initial message (named ‘run05.mpi.o47346077’).  It looks like a Perl issue with threads, though I don’t see why this would crash a cluster.  The fact there is a log file would suggest it just ended the job.

 

chris

 

From: maker-devel <[hidden email]> on behalf of Carson Holt <[hidden email]>
Date: Monday, August 14, 2017 at 2:18 PM
To: zl c <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] maker MPI problem

 

This is rather vague —> “crashed the computer cluster”

 

Do you have a specific error?

 

—Carson

 

 

 

On Aug 14, 2017, at 12:59 PM, zl c <[hidden email]> wrote:

 

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 

--------------------------------------------

Zelin Chen [[hidden email]]  Ph.D.

 

NIH/NHGRI

Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

<run05.mpi.o47346077>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker MPI problem

zl c
In reply to this post by Carson Holt-2
Here are some latest message:

[cn3360:57176] 1 more process has sent help message help-opal-runtime.txt / opal_init:warn-fork

[cn3360:57176] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages


--------------------------------------------
Zelin Chen [[hidden email]]



On Tue, Aug 15, 2017 at 10:39 AM, Carson Holt <[hidden email]> wrote:
You may need to delete the .../maker/perl directory before doing the reinstall if not doing a brand new installation. Otherwise you can ignore the subroutine redefined warnings during compile.

Have you been able to test the alternate flags on the command line for MPI? How about an alternate perl without threads?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 8:27 AM, zl c <[hidden email]> wrote:

When I installed by './Build install', I got following some messages:

Configuring MAKER with MPI support

Installing MAKER...

Configuring MAKER with MPI support

Subroutine dl_load_flags redefined at (eval 125) line 8.

Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9.

Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9.


I'm not sure whether it's correctly installed.

Thanks,

--------------------------------------------
Zelin Chen [[hidden email]]

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Mon, Aug 14, 2017 at 9:23 PM, Fields, Christopher J <[hidden email]> wrote:

Carson,

 

It was attached to the initial message (named ‘run05.mpi.o47346077’).  It looks like a Perl issue with threads, though I don’t see why this would crash a cluster.  The fact there is a log file would suggest it just ended the job.

 

chris

 

From: maker-devel <[hidden email]> on behalf of Carson Holt <[hidden email]>
Date: Monday, August 14, 2017 at 2:18 PM
To: zl c <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] maker MPI problem

 

This is rather vague —> “crashed the computer cluster”

 

Do you have a specific error?

 

—Carson

 

 

 

On Aug 14, 2017, at 12:59 PM, zl c <[hidden email]> wrote:

 

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 

--------------------------------------------

Zelin Chen [[hidden email]]  Ph.D.

 

NIH/NHGRI

Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

<run05.mpi.o47346077>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker MPI problem

Carson Holt-2
Did it die or did you just get a warning?

Here is a list of flags to add that suppress warnings and other issues with OpenMPI. You can add them all or one at a time depending on issues you get.

#add if MPI not using all CPU given
--oversubscribe --bind-to none

#workaround for infinaband (use instead of --mca ^openib)
--mca btl vader,tcp,self --mca btl_tcp_if_include ib0

#add to stop certain other warnings
--mca orte_base_help_aggregate 0

#stop fork warnings
--mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0

—Carson



On Aug 15, 2017, at 9:34 AM, zl c <[hidden email]> wrote:

Here are some latest message:

[cn3360:57176] 1 more process has sent help message help-opal-runtime.txt / opal_init:warn-fork
[cn3360:57176] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

--------------------------------------------
Zelin Chen [[hidden email]]



On Tue, Aug 15, 2017 at 10:39 AM, Carson Holt <[hidden email]> wrote:
You may need to delete the .../maker/perl directory before doing the reinstall if not doing a brand new installation. Otherwise you can ignore the subroutine redefined warnings during compile.

Have you been able to test the alternate flags on the command line for MPI? How about an alternate perl without threads?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 8:27 AM, zl c <[hidden email]> wrote:

When I installed by './Build install', I got following some messages:
Configuring MAKER with MPI support
Installing MAKER...
Configuring MAKER with MPI support
Subroutine dl_load_flags redefined at (eval 125) line 8.
Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9.

I'm not sure whether it's correctly installed.

Thanks,

--------------------------------------------
Zelin Chen [[hidden email]]

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Mon, Aug 14, 2017 at 9:23 PM, Fields, Christopher J <[hidden email]> wrote:

Carson,

 

It was attached to the initial message (named ‘run05.mpi.o47346077’).  It looks like a Perl issue with threads, though I don’t see why this would crash a cluster.  The fact there is a log file would suggest it just ended the job.

 

chris

 

From: maker-devel <[hidden email]> on behalf of Carson Holt <[hidden email]>
Date: Monday, August 14, 2017 at 2:18 PM
To: zl c <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] maker MPI problem

 

This is rather vague —> “crashed the computer cluster”

 

Do you have a specific error?

 

—Carson

 

 

 

On Aug 14, 2017, at 12:59 PM, zl c <[hidden email]> wrote:

 

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 

--------------------------------------------

Zelin Chen [[hidden email]]  Ph.D.

 

NIH/NHGRI

Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

<run05.mpi.o47346077>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker MPI problem

zl c
Hi Carson,  Christopher, Daniel,

Thank you for your kind help.

Now it works without any other options on one nodes and 4 CPUs. I set the number of task to 2, but there's only one contigs in running. Should it be two contigs running at the same time?

Zelin

--------------------------------------------
Zelin Chen [[hidden email]]


NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Tue, Aug 15, 2017 at 11:47 AM, Carson Holt <[hidden email]> wrote:
Did it die or did you just get a warning?

Here is a list of flags to add that suppress warnings and other issues with OpenMPI. You can add them all or one at a time depending on issues you get.

#add if MPI not using all CPU given
--oversubscribe --bind-to none

#workaround for infinaband (use instead of --mca ^openib)
--mca btl vader,tcp,self --mca btl_tcp_if_include ib0

#add to stop certain other warnings
--mca orte_base_help_aggregate 0

#stop fork warnings
--mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0

—Carson



On Aug 15, 2017, at 9:34 AM, zl c <[hidden email]> wrote:

Here are some latest message:

[cn3360:57176] 1 more process has sent help message help-opal-runtime.txt / opal_init:warn-fork
[cn3360:57176] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

--------------------------------------------
Zelin Chen [[hidden email]]



On Tue, Aug 15, 2017 at 10:39 AM, Carson Holt <[hidden email]> wrote:
You may need to delete the .../maker/perl directory before doing the reinstall if not doing a brand new installation. Otherwise you can ignore the subroutine redefined warnings during compile.

Have you been able to test the alternate flags on the command line for MPI? How about an alternate perl without threads?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 8:27 AM, zl c <[hidden email]> wrote:

When I installed by './Build install', I got following some messages:
Configuring MAKER with MPI support
Installing MAKER...
Configuring MAKER with MPI support
Subroutine dl_load_flags redefined at (eval 125) line 8.
Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9.

I'm not sure whether it's correctly installed.

Thanks,

--------------------------------------------
Zelin Chen [[hidden email]]

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Mon, Aug 14, 2017 at 9:23 PM, Fields, Christopher J <[hidden email]> wrote:

Carson,

 

It was attached to the initial message (named ‘run05.mpi.o47346077’).  It looks like a Perl issue with threads, though I don’t see why this would crash a cluster.  The fact there is a log file would suggest it just ended the job.

 

chris

 

From: maker-devel <[hidden email]> on behalf of Carson Holt <[hidden email]>
Date: Monday, August 14, 2017 at 2:18 PM
To: zl c <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] maker MPI problem

 

This is rather vague —> “crashed the computer cluster”

 

Do you have a specific error?

 

—Carson

 

 

 

On Aug 14, 2017, at 12:59 PM, zl c <[hidden email]> wrote:

 

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 

--------------------------------------------

Zelin Chen [[hidden email]]  Ph.D.

 

NIH/NHGRI

Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

<run05.mpi.o47346077>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 






_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker MPI problem

Carson Holt-2
What is your command line? Are you running interactively or as a submitted batch? If it's a batch job what options did you give it?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 2:47 PM, zl c <[hidden email]> wrote:

Hi Carson,  Christopher, Daniel,

Thank you for your kind help.

Now it works without any other options on one nodes and 4 CPUs. I set the number of task to 2, but there's only one contigs in running. Should it be two contigs running at the same time?

Zelin

--------------------------------------------
Zelin Chen [[hidden email]]


NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Tue, Aug 15, 2017 at 11:47 AM, Carson Holt <[hidden email]> wrote:
Did it die or did you just get a warning?

Here is a list of flags to add that suppress warnings and other issues with OpenMPI. You can add them all or one at a time depending on issues you get.

#add if MPI not using all CPU given
--oversubscribe --bind-to none

#workaround for infinaband (use instead of --mca ^openib)
--mca btl vader,tcp,self --mca btl_tcp_if_include ib0

#add to stop certain other warnings
--mca orte_base_help_aggregate 0

#stop fork warnings
--mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0

—Carson



On Aug 15, 2017, at 9:34 AM, zl c <[hidden email]> wrote:

Here are some latest message:

[cn3360:57176] 1 more process has sent help message help-opal-runtime.txt / opal_init:warn-fork
[cn3360:57176] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

--------------------------------------------
Zelin Chen [[hidden email]]



On Tue, Aug 15, 2017 at 10:39 AM, Carson Holt <[hidden email]> wrote:
You may need to delete the .../maker/perl directory before doing the reinstall if not doing a brand new installation. Otherwise you can ignore the subroutine redefined warnings during compile.

Have you been able to test the alternate flags on the command line for MPI? How about an alternate perl without threads?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 8:27 AM, zl c <[hidden email]> wrote:

When I installed by './Build install', I got following some messages:
Configuring MAKER with MPI support
Installing MAKER...
Configuring MAKER with MPI support
Subroutine dl_load_flags redefined at (eval 125) line 8.
Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9.

I'm not sure whether it's correctly installed.

Thanks,

--------------------------------------------
Zelin Chen [[hidden email]]

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Mon, Aug 14, 2017 at 9:23 PM, Fields, Christopher J <[hidden email]> wrote:

Carson,

 

It was attached to the initial message (named ‘run05.mpi.o47346077’).  It looks like a Perl issue with threads, though I don’t see why this would crash a cluster.  The fact there is a log file would suggest it just ended the job.

 

chris

 

From: maker-devel <[hidden email]> on behalf of Carson Holt <[hidden email]>
Date: Monday, August 14, 2017 at 2:18 PM
To: zl c <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] maker MPI problem

 

This is rather vague —> “crashed the computer cluster”

 

Do you have a specific error?

 

—Carson

 

 

 

On Aug 14, 2017, at 12:59 PM, zl c <[hidden email]> wrote:

 

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 

--------------------------------------------

Zelin Chen [[hidden email]]  Ph.D.

 

NIH/NHGRI

Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

<run05.mpi.o47346077>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 






_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker MPI problem

zl c
I submit a job:

sbatch --gres=lscratch:100 --time=8:00:00 --mem-per-cpu=8g -N 1-1 --ntasks=2 --ntasks-per-core=1 --job-name run06.mpi -o log/run06.mpi.o%A run06.maker.mpi.sh


mpiexec -n $SLURM_NTASKS maker -c 1 -base genome -g genome.fasta


Another question:
How much temporary space and memory should I use for a ~10Mb sequences and large database like nr and uniref90.

Thanks,
zelin

--------------------------------------------
Zelin Chen [[hidden email]]


On Tue, Aug 15, 2017 at 4:50 PM, Carson Holt <[hidden email]> wrote:
What is your command line? Are you running interactively or as a submitted batch? If it's a batch job what options did you give it?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 2:47 PM, zl c <[hidden email]> wrote:

Hi Carson,  Christopher, Daniel,

Thank you for your kind help.

Now it works without any other options on one nodes and 4 CPUs. I set the number of task to 2, but there's only one contigs in running. Should it be two contigs running at the same time?

Zelin

--------------------------------------------
Zelin Chen [[hidden email]]


NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Tue, Aug 15, 2017 at 11:47 AM, Carson Holt <[hidden email]> wrote:
Did it die or did you just get a warning?

Here is a list of flags to add that suppress warnings and other issues with OpenMPI. You can add them all or one at a time depending on issues you get.

#add if MPI not using all CPU given
--oversubscribe --bind-to none

#workaround for infinaband (use instead of --mca ^openib)
--mca btl vader,tcp,self --mca btl_tcp_if_include ib0

#add to stop certain other warnings
--mca orte_base_help_aggregate 0

#stop fork warnings
--mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0

—Carson



On Aug 15, 2017, at 9:34 AM, zl c <[hidden email]> wrote:

Here are some latest message:

[cn3360:57176] 1 more process has sent help message help-opal-runtime.txt / opal_init:warn-fork
[cn3360:57176] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

--------------------------------------------
Zelin Chen [[hidden email]]



On Tue, Aug 15, 2017 at 10:39 AM, Carson Holt <[hidden email]> wrote:
You may need to delete the .../maker/perl directory before doing the reinstall if not doing a brand new installation. Otherwise you can ignore the subroutine redefined warnings during compile.

Have you been able to test the alternate flags on the command line for MPI? How about an alternate perl without threads?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 8:27 AM, zl c <[hidden email]> wrote:

When I installed by './Build install', I got following some messages:
Configuring MAKER with MPI support
Installing MAKER...
Configuring MAKER with MPI support
Subroutine dl_load_flags redefined at (eval 125) line 8.
Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9.

I'm not sure whether it's correctly installed.

Thanks,

--------------------------------------------
Zelin Chen [[hidden email]]

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Mon, Aug 14, 2017 at 9:23 PM, Fields, Christopher J <[hidden email]> wrote:

Carson,

 

It was attached to the initial message (named ‘run05.mpi.o47346077’).  It looks like a Perl issue with threads, though I don’t see why this would crash a cluster.  The fact there is a log file would suggest it just ended the job.

 

chris

 

From: maker-devel <[hidden email]> on behalf of Carson Holt <[hidden email]>
Date: Monday, August 14, 2017 at 2:18 PM
To: zl c <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] maker MPI problem

 

This is rather vague —> “crashed the computer cluster”

 

Do you have a specific error?

 

—Carson

 

 

 

On Aug 14, 2017, at 12:59 PM, zl c <[hidden email]> wrote:

 

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 

--------------------------------------------

Zelin Chen [[hidden email]]  Ph.D.

 

NIH/NHGRI

Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

<run05.mpi.o47346077>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 







_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker MPI problem

Carson Holt-2
Some notes:

First, the mpiexec command still needs the --mca parameters (either  '--mca btl ^openib' or '--mca btl vader,tcp,self --mca btl_tcp_if_include ib0’). Otherwise if you have infiniband on the nodes it will try and use OpenFabrics compatible libraries which will kill code doing system calls (like MAKER does).

Second, try using a higher count than 2 in your batch. One process is always sacrificed by maker to act only for message management among processes, so with -n 2, you have one process working and one managing data. So only one contig will run at a time. If you set it to a higher number the issue will go away. The message manger process starts to get saturated at ~200 CPUs, so anything above that processor count becomes less beneficial to the job.

Thanks,
Carson




On Aug 15, 2017, at 3:05 PM, zl c <[hidden email]> wrote:

I submit a job:
sbatch --gres=lscratch:100 --time=8:00:00 --mem-per-cpu=8g -N 1-1 --ntasks=2 --ntasks-per-core=1 --job-name run06.mpi -o log/run06.mpi.o%A run06.maker.mpi.sh

mpiexec -n $SLURM_NTASKS maker -c 1 -base genome -g genome.fasta

Another question:
How much temporary space and memory should I use for a ~10Mb sequences and large database like nr and uniref90.

Thanks,
zelin

--------------------------------------------
Zelin Chen [[hidden email]]


On Tue, Aug 15, 2017 at 4:50 PM, Carson Holt <[hidden email]> wrote:
What is your command line? Are you running interactively or as a submitted batch? If it's a batch job what options did you give it?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 2:47 PM, zl c <[hidden email]> wrote:

Hi Carson,  Christopher, Daniel,

Thank you for your kind help.

Now it works without any other options on one nodes and 4 CPUs. I set the number of task to 2, but there's only one contigs in running. Should it be two contigs running at the same time?

Zelin

--------------------------------------------
Zelin Chen [[hidden email]]


NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Tue, Aug 15, 2017 at 11:47 AM, Carson Holt <[hidden email]> wrote:
Did it die or did you just get a warning?

Here is a list of flags to add that suppress warnings and other issues with OpenMPI. You can add them all or one at a time depending on issues you get.

#add if MPI not using all CPU given
--oversubscribe --bind-to none

#workaround for infinaband (use instead of --mca ^openib)
--mca btl vader,tcp,self --mca btl_tcp_if_include ib0

#add to stop certain other warnings
--mca orte_base_help_aggregate 0

#stop fork warnings
--mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0

—Carson



On Aug 15, 2017, at 9:34 AM, zl c <[hidden email]> wrote:

Here are some latest message:

[cn3360:57176] 1 more process has sent help message help-opal-runtime.txt / opal_init:warn-fork
[cn3360:57176] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

--------------------------------------------
Zelin Chen [[hidden email]]



On Tue, Aug 15, 2017 at 10:39 AM, Carson Holt <[hidden email]> wrote:
You may need to delete the .../maker/perl directory before doing the reinstall if not doing a brand new installation. Otherwise you can ignore the subroutine redefined warnings during compile.

Have you been able to test the alternate flags on the command line for MPI? How about an alternate perl without threads?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 8:27 AM, zl c <[hidden email]> wrote:

When I installed by './Build install', I got following some messages:
Configuring MAKER with MPI support
Installing MAKER...
Configuring MAKER with MPI support
Subroutine dl_load_flags redefined at (eval 125) line 8.
Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9.

I'm not sure whether it's correctly installed.

Thanks,

--------------------------------------------
Zelin Chen [[hidden email]]

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Mon, Aug 14, 2017 at 9:23 PM, Fields, Christopher J <[hidden email]> wrote:

Carson,

 

It was attached to the initial message (named ‘run05.mpi.o47346077’).  It looks like a Perl issue with threads, though I don’t see why this would crash a cluster.  The fact there is a log file would suggest it just ended the job.

 

chris

 

From: maker-devel <[hidden email]> on behalf of Carson Holt <[hidden email]>
Date: Monday, August 14, 2017 at 2:18 PM
To: zl c <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] maker MPI problem

 

This is rather vague —> “crashed the computer cluster”

 

Do you have a specific error?

 

—Carson

 

 

 

On Aug 14, 2017, at 12:59 PM, zl c <[hidden email]> wrote:

 

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 

--------------------------------------------

Zelin Chen [[hidden email]]  Ph.D.

 

NIH/NHGRI

Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

<run05.mpi.o47346077>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 








_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker MPI problem

zl c
This is a test run, so I use only 2 tasks. I'll try more tasks and your options.

Thanks,
Zelin



On Tue, Aug 15, 2017 at 5:13 PM, Carson Holt <[hidden email]> wrote:
Some notes:

First, the mpiexec command still needs the --mca parameters (either  '--mca btl ^openib' or '--mca btl vader,tcp,self --mca btl_tcp_if_include ib0’). Otherwise if you have infiniband on the nodes it will try and use OpenFabrics compatible libraries which will kill code doing system calls (like MAKER does).

Second, try using a higher count than 2 in your batch. One process is always sacrificed by maker to act only for message management among processes, so with -n 2, you have one process working and one managing data. So only one contig will run at a time. If you set it to a higher number the issue will go away. The message manger process starts to get saturated at ~200 CPUs, so anything above that processor count becomes less beneficial to the job.

Thanks,
Carson




On Aug 15, 2017, at 3:05 PM, zl c <[hidden email]> wrote:

I submit a job:
sbatch --gres=lscratch:100 --time=8:00:00 --mem-per-cpu=8g -N 1-1 --ntasks=2 --ntasks-per-core=1 --job-name run06.mpi -o log/run06.mpi.o%A run06.maker.mpi.sh

mpiexec -n $SLURM_NTASKS maker -c 1 -base genome -g genome.fasta

Another question:
How much temporary space and memory should I use for a ~10Mb sequences and large database like nr and uniref90.

Thanks,
zelin

--------------------------------------------
Zelin Chen [[hidden email]]


On Tue, Aug 15, 2017 at 4:50 PM, Carson Holt <[hidden email]> wrote:
What is your command line? Are you running interactively or as a submitted batch? If it's a batch job what options did you give it?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 2:47 PM, zl c <[hidden email]> wrote:

Hi Carson,  Christopher, Daniel,

Thank you for your kind help.

Now it works without any other options on one nodes and 4 CPUs. I set the number of task to 2, but there's only one contigs in running. Should it be two contigs running at the same time?

Zelin

--------------------------------------------
Zelin Chen [[hidden email]]


NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Tue, Aug 15, 2017 at 11:47 AM, Carson Holt <[hidden email]> wrote:
Did it die or did you just get a warning?

Here is a list of flags to add that suppress warnings and other issues with OpenMPI. You can add them all or one at a time depending on issues you get.

#add if MPI not using all CPU given
--oversubscribe --bind-to none

#workaround for infinaband (use instead of --mca ^openib)
--mca btl vader,tcp,self --mca btl_tcp_if_include ib0

#add to stop certain other warnings
--mca orte_base_help_aggregate 0

#stop fork warnings
--mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0

—Carson



On Aug 15, 2017, at 9:34 AM, zl c <[hidden email]> wrote:

Here are some latest message:

[cn3360:57176] 1 more process has sent help message help-opal-runtime.txt / opal_init:warn-fork
[cn3360:57176] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

--------------------------------------------
Zelin Chen [[hidden email]]



On Tue, Aug 15, 2017 at 10:39 AM, Carson Holt <[hidden email]> wrote:
You may need to delete the .../maker/perl directory before doing the reinstall if not doing a brand new installation. Otherwise you can ignore the subroutine redefined warnings during compile.

Have you been able to test the alternate flags on the command line for MPI? How about an alternate perl without threads?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 8:27 AM, zl c <[hidden email]> wrote:

When I installed by './Build install', I got following some messages:
Configuring MAKER with MPI support
Installing MAKER...
Configuring MAKER with MPI support
Subroutine dl_load_flags redefined at (eval 125) line 8.
Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9.

I'm not sure whether it's correctly installed.

Thanks,

--------------------------------------------
Zelin Chen [[hidden email]]

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Mon, Aug 14, 2017 at 9:23 PM, Fields, Christopher J <[hidden email]> wrote:

Carson,

 

It was attached to the initial message (named ‘run05.mpi.o47346077’).  It looks like a Perl issue with threads, though I don’t see why this would crash a cluster.  The fact there is a log file would suggest it just ended the job.

 

chris

 

From: maker-devel <[hidden email]> on behalf of Carson Holt <[hidden email]>
Date: Monday, August 14, 2017 at 2:18 PM
To: zl c <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] maker MPI problem

 

This is rather vague —> “crashed the computer cluster”

 

Do you have a specific error?

 

—Carson

 

 

 

On Aug 14, 2017, at 12:59 PM, zl c <[hidden email]> wrote:

 

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 

--------------------------------------------

Zelin Chen [[hidden email]]  Ph.D.

 

NIH/NHGRI

Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

<run05.mpi.o47346077>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 









_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker MPI problem

zl c
In reply to this post by Carson Holt-2
I use '--mca btl ^openib' and it runs on multiple nodes. It works and I see some sequences is done for the test run. 

Then I make another run using the large nr database and use local space on the computer cluster, which fails. 
Submit CMD:

sbatch --gres=lscratch:100 --time=168:00:00 --partition=multinode --constraint=x2680 --mem-per-cpu=64g --ntasks=8 --ntasks-per-core=1 --job-name run05.mpi -o log.mpi.00/run05.mpi.o%A run05.mpi.sh

Error message:

#--------- command -------------#

Widget::tblastx:

/usr/local/apps/blast/ncbi-blast-2.5.0+/bin/tblastx -db /lscratch/47455932/maker_BLLXNq/rna%2Efasta.mpi.10.21 -query /lscratch/47455932/maker_BLLXNq/50/tig00017383_arrow.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -dbsize 1000 -searchsp 500000000 -num_threads 1 -lcase_masking -seg yes -soft_masking true -show_gis -out /gpfs/gsfs6/users/chenz11/goldfish/11549472/sergey_canu70x/arrow/maker5/goldfish.arrow.renamed.maker.output/goldfish.arrow.renamed_datastore/5A/65/tig00017383_arrow//theVoid.tig00017383_arrow/0/tig00017383_arrow.0.rna%2Efasta.tblastx.temp_dir/rna%2Efasta.mpi.10.21.tblastx

#-------------------------------#

Thread 1 terminated abnormally: can't open /lscratch/47455932/mpiavG_z: No such file or directory at /home/chenz11/program/maker_mpi/bin/maker line 1460 thread 1.

--> rank=37, hostname=cn4120

FATAL: Thread terminated, causing all processes to fail

--> rank=37, hostname=cn4120

-------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code.. Per user-direction, the job has been aborted.

-------------------------------------------------------

SIGTERM received

SIGTERM received

SIGTERM received

running  blast search.

#--------- command -------------#

Widget::tblastx:

/usr/local/apps/blast/ncbi-blast-2.5.0+/bin/tblastx -db /lscratch/47455932/maker_Zsg_Gg/rna%2Efasta.mpi.10.8 -query /lscratch/47455932/maker_Zsg_Gg/62/tig00001111_arrow.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -dbsize 1000 -searchsp 500000000 -num_threads 1 -lcase_masking -seg yes -soft_masking true -show_gis -out /gpfs/gsfs6/users/chenz11/goldfish/11549472/sergey_canu70x/arrow/maker5/goldfish.arrow.renamed.maker.output/goldfish.arrow.renamed_datastore/B8/A5/tig00001111_arrow//theVoid.tig00001111_arrow/0/tig00001111_arrow.0.rna%2Efasta.tblastx.temp_dir/rna%2Efasta.mpi.10.8.tblastx

#-------------------------------#

Perl exited with active threads:

    1 running and unjoined

    0 finished and unjoined

    0 running and detached

Perl exited with active threads:

    1 running and unjoined

    0 finished and unjoined

    0 running and detached

--------------------------------------------------------------------------

An MPI communication peer process has unexpectedly disconnected.  This

usually indicates a failure in the peer process (e.g., a crash or

otherwise exiting without calling MPI_FINALIZE first).


Although this local MPI process will likely now behave unpredictably

(it may even hang or crash), the root cause of this problem is the

failure of the peer -- that is what you need to investigate.  For

example, there may be a core file that you can examine.  More

generally: such peer hangups are frequently caused by application bugs

or other external events.


  Local host: cn4130

  Local PID:  18831

  Peer host:  cn3683

--------------------------------------------------------------------------

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

formating database...

#--------- command -------------#

Widget::formater:

/usr/local/apps/blast/ncbi-blast-2.5.0+/bin/makeblastdb -dbtype prot -in /lscratch/47455932/maker_rNzO3X/27/blastprep/protein2%2Efasta.mpi.10.25

#-------------------------------#

SIGTERM received

SIGTERM received

SIGTERM received

SIGTERM received

...

SIGTERM received

SIGTERM received

SIGTERM received

Perl exited with active threads:

    1 running and unjoined

    0 finished and unjoined

    0 running and detached

Perl exited with active threads:

    1 running and unjoined

    0 finished and unjoined

    0 running and detached

Perl exited with active threads:

    1 running and unjoined

    0 finished and unjoined

    0 running and detached


...

[cn3683:36010] 59 more processes have sent help message help-mpi-btl-tcp.txt / peer hung up

[cn3683:36010] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

--------------------------------------------------------------------------

mpiexec detected that one or more processes exited with non-zero status, thus causing

the job to be terminated. The first process to do so was:


  Process name: [[352,1],37]

  Exit code:    255

--------------------------------------------------------------------------

 

I rebuild the mpi_blast and rerun it again, also get the error:

#--------- command -------------#

Widget::formater:

/usr/local/apps/blast/ncbi-blast-2.5.0+/bin/makeblastdb -dbtype nucl -in /lscratch/47559740/maker_k6a7Hy/32/blastprep/rna%2Efasta.mpi.10.3

#-------------------------------#

Thread 1 terminated abnormally: can't open /lscratch/47559740/mpiS84Ju: No such file or directory at /home/chenz11/program/maker_mpi/bin/maker line 1460 thread 1.

--> rank=27, hostname=cn4115

FATAL: Thread terminated, causing all processes to fail

--> rank=27, hostname=cn4115

deleted:276 hits

doing tblastx of alt-ESTs

formating database...

#--------- command -------------#

Widget::formater:

/usr/local/apps/blast/ncbi-blast-2.5.0+/bin/makeblastdb -dbtype nucl -in /lscratch/47559740/maker_nCKTgE/2/blastprep/rna%2Efasta.mpi.10.11

#-------------------------------#

running  blast search.

#--------- command -------------#

Widget::tblastx:

/usr/local/apps/blast/ncbi-blast-2.5.0+/bin/tblastx -db /lscratch/47559740/maker_0kWZTA/rna%2Efasta.mpi.10.20 -query /lscratch/47559740/maker_0kWZTA/35/tig00027947_arrow.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -dbsize 1000 -searchsp 500000000 -num_threads 1 -lcase_masking -seg yes -soft_masking true -show_gis -out /gpfs/gsfs6/users/chenz11/goldfish/11549472/sergey_canu70x/arrow/maker5/goldfish.arrow.renamed.maker.output/goldfish.arrow.renamed_datastore/86/7F/tig00027947_arrow//theVoid.tig00027947_arrow/0/tig00027947_arrow.0.rna%2Efasta.tblastx.temp_dir/rna%2Efasta.mpi.10.20.tblastx

#-------------------------------#

-------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code.. Per user-direction, the job has been aborted.

-------------------------------------------------------

SIGTERM received

SIGTERM received

SIGTERM received

Perl exited with active threads:

    1 running and unjoined

    0 finished and unjoined

    0 running and detached

Perl exited with active threads:

    1 running and unjoined

    0 finished and unjoined

    0 running and detached


Thanks,
Zelin

--------------------------------------------
Zelin Chen [[hidden email]]

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Tue, Aug 15, 2017 at 5:13 PM, Carson Holt <[hidden email]> wrote:
Some notes:

First, the mpiexec command still needs the --mca parameters (either  '--mca btl ^openib' or '--mca btl vader,tcp,self --mca btl_tcp_if_include ib0’). Otherwise if you have infiniband on the nodes it will try and use OpenFabrics compatible libraries which will kill code doing system calls (like MAKER does).

Second, try using a higher count than 2 in your batch. One process is always sacrificed by maker to act only for message management among processes, so with -n 2, you have one process working and one managing data. So only one contig will run at a time. If you set it to a higher number the issue will go away. The message manger process starts to get saturated at ~200 CPUs, so anything above that processor count becomes less beneficial to the job.

Thanks,
Carson




On Aug 15, 2017, at 3:05 PM, zl c <[hidden email]> wrote:

I submit a job:
sbatch --gres=lscratch:100 --time=8:00:00 --mem-per-cpu=8g -N 1-1 --ntasks=2 --ntasks-per-core=1 --job-name run06.mpi -o log/run06.mpi.o%A run06.maker.mpi.sh

mpiexec -n $SLURM_NTASKS maker -c 1 -base genome -g genome.fasta

Another question:
How much temporary space and memory should I use for a ~10Mb sequences and large database like nr and uniref90.

Thanks,
zelin

--------------------------------------------
Zelin Chen [[hidden email]]


On Tue, Aug 15, 2017 at 4:50 PM, Carson Holt <[hidden email]> wrote:
What is your command line? Are you running interactively or as a submitted batch? If it's a batch job what options did you give it?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 2:47 PM, zl c <[hidden email]> wrote:

Hi Carson,  Christopher, Daniel,

Thank you for your kind help.

Now it works without any other options on one nodes and 4 CPUs. I set the number of task to 2, but there's only one contigs in running. Should it be two contigs running at the same time?

Zelin

--------------------------------------------
Zelin Chen [[hidden email]]


NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Tue, Aug 15, 2017 at 11:47 AM, Carson Holt <[hidden email]> wrote:
Did it die or did you just get a warning?

Here is a list of flags to add that suppress warnings and other issues with OpenMPI. You can add them all or one at a time depending on issues you get.

#add if MPI not using all CPU given
--oversubscribe --bind-to none

#workaround for infinaband (use instead of --mca ^openib)
--mca btl vader,tcp,self --mca btl_tcp_if_include ib0

#add to stop certain other warnings
--mca orte_base_help_aggregate 0

#stop fork warnings
--mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0

—Carson



On Aug 15, 2017, at 9:34 AM, zl c <[hidden email]> wrote:

Here are some latest message:

[cn3360:57176] 1 more process has sent help message help-opal-runtime.txt / opal_init:warn-fork
[cn3360:57176] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

--------------------------------------------
Zelin Chen [[hidden email]]



On Tue, Aug 15, 2017 at 10:39 AM, Carson Holt <[hidden email]> wrote:
You may need to delete the .../maker/perl directory before doing the reinstall if not doing a brand new installation. Otherwise you can ignore the subroutine redefined warnings during compile.

Have you been able to test the alternate flags on the command line for MPI? How about an alternate perl without threads?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 8:27 AM, zl c <[hidden email]> wrote:

When I installed by './Build install', I got following some messages:
Configuring MAKER with MPI support
Installing MAKER...
Configuring MAKER with MPI support
Subroutine dl_load_flags redefined at (eval 125) line 8.
Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9.

I'm not sure whether it's correctly installed.

Thanks,

--------------------------------------------
Zelin Chen [[hidden email]]

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Mon, Aug 14, 2017 at 9:23 PM, Fields, Christopher J <[hidden email]> wrote:

Carson,

 

It was attached to the initial message (named ‘run05.mpi.o47346077’).  It looks like a Perl issue with threads, though I don’t see why this would crash a cluster.  The fact there is a log file would suggest it just ended the job.

 

chris

 

From: maker-devel <[hidden email]> on behalf of Carson Holt <[hidden email]>
Date: Monday, August 14, 2017 at 2:18 PM
To: zl c <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] maker MPI problem

 

This is rather vague —> “crashed the computer cluster”

 

Do you have a specific error?

 

—Carson

 

 

 

On Aug 14, 2017, at 12:59 PM, zl c <[hidden email]> wrote:

 

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 

--------------------------------------------

Zelin Chen [[hidden email]]  Ph.D.

 

NIH/NHGRI

Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

<run05.mpi.o47346077>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 









_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker MPI problem

Carson Holt-2
This is the causal error —> can't open /lscratch/47455932/mpiavG_z

It kills one process and causes everything else to die in an ugly way.

There are several possible causes:

1.  /lscratch/47455932/ is not actually locally mounted. It may be a virtual directory created at run time that exists on the network but not as a true locally mounted disk. If this is the case, there can be a slight IO delay under heavy IO load (common on NFS) that can cause directories and files to appear to not exist. This is one of the reasons TMP= must be sent to a true locally mounted disk. The IO load MAKER can produce can swamp network mounted disks creating strange errors.

2.  /lscratch/47455932/ may only exist on the head node and not other nodes for the job.  True local temporary storage is not available across nodes. It is only available on the node it is attached to. So if you are creating the location as part of your job, it may only exist on the head node and not the other nodes. Usually this value is set to /tmp because each machine should have it’s own independent /tmp location.

3. /lscratch/47455932/ exists on all nodes, but is full on one of them.

—Carson





—Carson

On Aug 17, 2017, at 7:39 AM, zl c <[hidden email]> wrote:

I use '--mca btl ^openib' and it runs on multiple nodes. It works and I see some sequences is done for the test run. 

Then I make another run using the large nr database and use local space on the computer cluster, which fails. 
Submit CMD:
sbatch --gres=lscratch:100 --time=168:00:00 --partition=multinode --constraint=x2680 --mem-per-cpu=64g --ntasks=8 --ntasks-per-core=1 --job-name run05.mpi -o log.mpi.00/run05.mpi.o%A run05.mpi.sh
Error message:
#--------- command -------------#
Widget::tblastx:
/usr/local/apps/blast/ncbi-blast-2.5.0+/bin/tblastx -db /lscratch/47455932/maker_BLLXNq/rna%2Efasta.mpi.10.21 -query /lscratch/47455932/maker_BLLXNq/50/tig00017383_arrow.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -dbsize 1000 -searchsp 500000000 -num_threads 1 -lcase_masking -seg yes -soft_masking true -show_gis -out /gpfs/gsfs6/users/chenz11/goldfish/11549472/sergey_canu70x/arrow/maker5/goldfish.arrow.renamed.maker.output/goldfish.arrow.renamed_datastore/5A/65/tig00017383_arrow//theVoid.tig00017383_arrow/0/tig00017383_arrow.0.rna%2Efasta.tblastx.temp_dir/rna%2Efasta.mpi.10.21.tblastx
#-------------------------------#
Thread 1 terminated abnormally: can't open /lscratch/47455932/mpiavG_z: No such file or directory at /home/chenz11/program/maker_mpi/bin/maker line 1460 thread 1.
--> rank=37, hostname=cn4120
FATAL: Thread terminated, causing all processes to fail
--> rank=37, hostname=cn4120
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
SIGTERM received
SIGTERM received
SIGTERM received
running  blast search.
#--------- command -------------#
Widget::tblastx:
/usr/local/apps/blast/ncbi-blast-2.5.0+/bin/tblastx -db /lscratch/47455932/maker_Zsg_Gg/rna%2Efasta.mpi.10.8 -query /lscratch/47455932/maker_Zsg_Gg/62/tig00001111_arrow.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -dbsize 1000 -searchsp 500000000 -num_threads 1 -lcase_masking -seg yes -soft_masking true -show_gis -out /gpfs/gsfs6/users/chenz11/goldfish/11549472/sergey_canu70x/arrow/maker5/goldfish.arrow.renamed.maker.output/goldfish.arrow.renamed_datastore/B8/A5/tig00001111_arrow//theVoid.tig00001111_arrow/0/tig00001111_arrow.0.rna%2Efasta.tblastx.temp_dir/rna%2Efasta.mpi.10.8.tblastx
#-------------------------------#
Perl exited with active threads:
    1 running and unjoined
    0 finished and unjoined
    0 running and detached
Perl exited with active threads:
    1 running and unjoined
    0 finished and unjoined
    0 running and detached
--------------------------------------------------------------------------
An MPI communication peer process has unexpectedly disconnected.  This
usually indicates a failure in the peer process (e.g., a crash or
otherwise exiting without calling MPI_FINALIZE first).

Although this local MPI process will likely now behave unpredictably
(it may even hang or crash), the root cause of this problem is the
failure of the peer -- that is what you need to investigate.  For
example, there may be a core file that you can examine.  More
generally: such peer hangups are frequently caused by application bugs
or other external events.

  Local host: cn4130
  Local PID:  18831
  Peer host:  cn3683
--------------------------------------------------------------------------
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
formating database...
#--------- command -------------#

Widget::formater:
/usr/local/apps/blast/ncbi-blast-2.5.0+/bin/makeblastdb -dbtype prot -in /lscratch/47455932/maker_rNzO3X/27/blastprep/protein2%2Efasta.mpi.10.25
#-------------------------------#
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
...
SIGTERM received
SIGTERM received
SIGTERM received
Perl exited with active threads:
    1 running and unjoined
    0 finished and unjoined
    0 running and detached
Perl exited with active threads:
    1 running and unjoined
    0 finished and unjoined
    0 running and detached
Perl exited with active threads:
    1 running and unjoined
    0 finished and unjoined
    0 running and detached

...
[cn3683:36010] 59 more processes have sent help message help-mpi-btl-tcp.txt / peer hung up
[cn3683:36010] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[352,1],37]
  Exit code:    255
--------------------------------------------------------------------------
 

I rebuild the mpi_blast and rerun it again, also get the error:
#--------- command -------------#
Widget::formater:
/usr/local/apps/blast/ncbi-blast-2.5.0+/bin/makeblastdb -dbtype nucl -in /lscratch/47559740/maker_k6a7Hy/32/blastprep/rna%2Efasta.mpi.10.3
#-------------------------------#
Thread 1 terminated abnormally: can't open /lscratch/47559740/mpiS84Ju: No such file or directory at /home/chenz11/program/maker_mpi/bin/maker line 1460 thread 1.
--> rank=27, hostname=cn4115
FATAL: Thread terminated, causing all processes to fail
--> rank=27, hostname=cn4115
deleted:276 hits
doing tblastx of alt-ESTs
formating database...
#--------- command -------------#
Widget::formater:
/usr/local/apps/blast/ncbi-blast-2.5.0+/bin/makeblastdb -dbtype nucl -in /lscratch/47559740/maker_nCKTgE/2/blastprep/rna%2Efasta.mpi.10.11
#-------------------------------#
running  blast search.
#--------- command -------------#
Widget::tblastx:
/usr/local/apps/blast/ncbi-blast-2.5.0+/bin/tblastx -db /lscratch/47559740/maker_0kWZTA/rna%2Efasta.mpi.10.20 -query /lscratch/47559740/maker_0kWZTA/35/tig00027947_arrow.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -dbsize 1000 -searchsp 500000000 -num_threads 1 -lcase_masking -seg yes -soft_masking true -show_gis -out /gpfs/gsfs6/users/chenz11/goldfish/11549472/sergey_canu70x/arrow/maker5/goldfish.arrow.renamed.maker.output/goldfish.arrow.renamed_datastore/86/7F/tig00027947_arrow//theVoid.tig00027947_arrow/0/tig00027947_arrow.0.rna%2Efasta.tblastx.temp_dir/rna%2Efasta.mpi.10.20.tblastx
#-------------------------------#
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
SIGTERM received
SIGTERM received
SIGTERM received
Perl exited with active threads:
    1 running and unjoined
    0 finished and unjoined
    0 running and detached
Perl exited with active threads:
    1 running and unjoined
    0 finished and unjoined
    0 running and detached

Thanks,
Zelin

--------------------------------------------
Zelin Chen [[hidden email]]

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Tue, Aug 15, 2017 at 5:13 PM, Carson Holt <[hidden email]> wrote:
Some notes:

First, the mpiexec command still needs the --mca parameters (either  '--mca btl ^openib' or '--mca btl vader,tcp,self --mca btl_tcp_if_include ib0’). Otherwise if you have infiniband on the nodes it will try and use OpenFabrics compatible libraries which will kill code doing system calls (like MAKER does).

Second, try using a higher count than 2 in your batch. One process is always sacrificed by maker to act only for message management among processes, so with -n 2, you have one process working and one managing data. So only one contig will run at a time. If you set it to a higher number the issue will go away. The message manger process starts to get saturated at ~200 CPUs, so anything above that processor count becomes less beneficial to the job.

Thanks,
Carson




On Aug 15, 2017, at 3:05 PM, zl c <[hidden email]> wrote:

I submit a job:
sbatch --gres=lscratch:100 --time=8:00:00 --mem-per-cpu=8g -N 1-1 --ntasks=2 --ntasks-per-core=1 --job-name run06.mpi -o log/run06.mpi.o%A run06.maker.mpi.sh

mpiexec -n $SLURM_NTASKS maker -c 1 -base genome -g genome.fasta

Another question:
How much temporary space and memory should I use for a ~10Mb sequences and large database like nr and uniref90.

Thanks,
zelin

--------------------------------------------
Zelin Chen [[hidden email]]


On Tue, Aug 15, 2017 at 4:50 PM, Carson Holt <[hidden email]> wrote:
What is your command line? Are you running interactively or as a submitted batch? If it's a batch job what options did you give it?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 2:47 PM, zl c <[hidden email]> wrote:

Hi Carson,  Christopher, Daniel,

Thank you for your kind help.

Now it works without any other options on one nodes and 4 CPUs. I set the number of task to 2, but there's only one contigs in running. Should it be two contigs running at the same time?

Zelin

--------------------------------------------
Zelin Chen [[hidden email]]


NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Tue, Aug 15, 2017 at 11:47 AM, Carson Holt <[hidden email]> wrote:
Did it die or did you just get a warning?

Here is a list of flags to add that suppress warnings and other issues with OpenMPI. You can add them all or one at a time depending on issues you get.

#add if MPI not using all CPU given
--oversubscribe --bind-to none

#workaround for infinaband (use instead of --mca ^openib)
--mca btl vader,tcp,self --mca btl_tcp_if_include ib0

#add to stop certain other warnings
--mca orte_base_help_aggregate 0

#stop fork warnings
--mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0

—Carson



On Aug 15, 2017, at 9:34 AM, zl c <[hidden email]> wrote:

Here are some latest message:

[cn3360:57176] 1 more process has sent help message help-opal-runtime.txt / opal_init:warn-fork
[cn3360:57176] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

--------------------------------------------
Zelin Chen [[hidden email]]



On Tue, Aug 15, 2017 at 10:39 AM, Carson Holt <[hidden email]> wrote:
You may need to delete the .../maker/perl directory before doing the reinstall if not doing a brand new installation. Otherwise you can ignore the subroutine redefined warnings during compile.

Have you been able to test the alternate flags on the command line for MPI? How about an alternate perl without threads?

--Carson

Sent from my iPhone

On Aug 15, 2017, at 8:27 AM, zl c <[hidden email]> wrote:

When I installed by './Build install', I got following some messages:
Configuring MAKER with MPI support
Installing MAKER...
Configuring MAKER with MPI support
Subroutine dl_load_flags redefined at (eval 125) line 8.
Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9.
Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9.

I'm not sure whether it's correctly installed.

Thanks,

--------------------------------------------
Zelin Chen [[hidden email]]

NIH/NHGRI
Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

On Mon, Aug 14, 2017 at 9:23 PM, Fields, Christopher J <[hidden email]> wrote:

Carson,

 

It was attached to the initial message (named ‘run05.mpi.o47346077’).  It looks like a Perl issue with threads, though I don’t see why this would crash a cluster.  The fact there is a log file would suggest it just ended the job.

 

chris

 

From: maker-devel <[hidden email]> on behalf of Carson Holt <[hidden email]>
Date: Monday, August 14, 2017 at 2:18 PM
To: zl c <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] maker MPI problem

 

This is rather vague —> “crashed the computer cluster”

 

Do you have a specific error?

 

—Carson

 

 

 

On Aug 14, 2017, at 12:59 PM, zl c <[hidden email]> wrote:

 

Hello,

 

I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?

 

CMD:

export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so

export OMPI_MCA_mpi_warn_on_fork=0

mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta

 

Thanks,

Zelin Chen

 

--------------------------------------------

Zelin Chen [[hidden email]]  Ph.D.

 

NIH/NHGRI

Building 50, Room 5531
50 SOUTH DR, MSC 8004 
BETHESDA, MD 20892-8004

<run05.mpi.o47346077>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 










_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org