Segfault with OpenMPI

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Segfault with OpenMPI

Anthony Bretaudeau-2

Hi,

I've worked on the Bioconda recipe for Maker (https://github.com/bioconda/bioconda-recipes/tree/master/recipes/maker/). It works well, except when using it in MPI mode. I get this segfault error:

STATUS: Processing and indexing input FASTA files...
[cl1n022:06306] *** Process received signal ***
[cl1n022:06306] Signal: Segmentation fault (11)
[cl1n022:06306] Signal code: Address not mapped (1)
[cl1n022:06306] Failing at address: 0x514
[cl1n022:06306] [ 0] /lib64/libpthread.so.0(+0xf6d0)[0x2b9ce51026d0]
[cl1n022:06306] [ 1] /local/miniconda3/envs/maker-2.31.10/bin/perl(Perl_csighandler+0x1e)[0x4aad4e]
[cl1n022:06306] [ 2] /lib64/libpthread.so.0(+0xf6d0)[0x2b9ce51026d0]
[cl1n022:06306] [ 3] /lib64/libc.so.6(__poll+0x2d)[0x2b9ce5f5cf0d]
[cl1n022:06306] [ 4] /local/miniconda3/envs/maker-2.31.10/perl/lib/auto/Parallel/Application/MPI/../../../../../../lib/./libopen-pal.so.40(+0x869e5)[0x2b9cf05859e5]
[cl1n022:06306] [ 5] /local/miniconda3/envs/maker-2.31.10/perl/lib/auto/Parallel/Application/MPI/../../../../../../lib/./libopen-pal.so.40(opal_libevent2022_event_base_loop+0x242)[0x2b9cf057a73a]
[cl1n022:06306] [ 6] /local/miniconda3/envs/maker-2.31.10/perl/lib/auto/Parallel/Application/MPI/../../../../../../lib/./libopen-pal.so.40(+0x384de)[0x2b9cf05374de]
[cl1n022:06306] [ 7] /lib64/libpthread.so.0(+0x7e25)[0x2b9ce50fae25]
[cl1n022:06306] [ 8] /lib64/libc.so.6(clone+0x6d)[0x2b9ce5f67bad]
[cl1n022:06306] *** End of error message ***
SIGTERM received
SIGTERM received


As mentioned in older posts, I've tried adding the LD_PRELOAD variable, or running mpirun with the "-mca btl ^openib" option, but it didn't help.

As this happens with the Bioconda package, I guess it should be pretty reproducible on other setups.

Bioconda's Maker package uses version 5.26.2 of Perl and version 3.1.2 of OpenMPI, and the OpenMPI recipe is on https://github.com/conda-forge/openmpi-feedstock/tree/master/recipe

Any help would be highly appreciated!

Anthony Bretaudeau


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Segfault with OpenMPI

Carson Holt-2
I tried setting this up but there are a number of issues I run into.

First RepeatMasker is not being installed correctly. The configuration step should create these files (created by ./configure script during RepeatMasker setup) —>
RepeatMasker.lib
RepeatMasker.lib.nhr
RepeatMasker.lib.nin
RepeatMasker.lib.nsq
RepeatMaskerLib.embl

But they do not exist in the share directory.

Also MAKER needs access to the te_proteins file in …/maker/data, and because you have rearranged maker’s structure it can’t find it.


Then for the Segmentation fault, I have seen this a handful of times in the past where users install their own version of perl rather than using the system perl together with their own install of OpenMPI. The issue is some series of flags either in OpenMPi or perl (I’m not sure which). But one way around it is to disable the interpreter threads option when compiling and installing perl for yourself. Most system perl installs have interpreter threads enabled, so I’m not sure why some self-installs generate this segfault and never the system perl. Interestingly interpreter threads are turned off by default when you install perl manually as they are “officially discouraged". You actually have to enable it during the self-install process, and conda is enabling them on the manual install to match most system perls.

Another work around is don’t use OpenMPI. Try MPICH3.


—Carson





On Sep 25, 2018, at 6:10 AM, Anthony Bretaudeau <[hidden email]> wrote:

Hi,

I've worked on the Bioconda recipe for Maker (https://github.com/bioconda/bioconda-recipes/tree/master/recipes/maker/). It works well, except when using it in MPI mode. I get this segfault error:

STATUS: Processing and indexing input FASTA files...
[cl1n022:06306] *** Process received signal ***
[cl1n022:06306] Signal: Segmentation fault (11)
[cl1n022:06306] Signal code: Address not mapped (1)
[cl1n022:06306] Failing at address: 0x514
[cl1n022:06306] [ 0] /lib64/libpthread.so.0(+0xf6d0)[0x2b9ce51026d0]
[cl1n022:06306] [ 1] /local/miniconda3/envs/maker-2.31.10/bin/perl(Perl_csighandler+0x1e)[0x4aad4e]
[cl1n022:06306] [ 2] /lib64/libpthread.so.0(+0xf6d0)[0x2b9ce51026d0]
[cl1n022:06306] [ 3] /lib64/libc.so.6(__poll+0x2d)[0x2b9ce5f5cf0d]
[cl1n022:06306] [ 4] /local/miniconda3/envs/maker-2.31.10/perl/lib/auto/Parallel/Application/MPI/../../../../../../lib/./libopen-pal.so.40(+0x869e5)[0x2b9cf05859e5]
[cl1n022:06306] [ 5] /local/miniconda3/envs/maker-2.31.10/perl/lib/auto/Parallel/Application/MPI/../../../../../../lib/./libopen-pal.so.40(opal_libevent2022_event_base_loop+0x242)[0x2b9cf057a73a]
[cl1n022:06306] [ 6] /local/miniconda3/envs/maker-2.31.10/perl/lib/auto/Parallel/Application/MPI/../../../../../../lib/./libopen-pal.so.40(+0x384de)[0x2b9cf05374de]
[cl1n022:06306] [ 7] /lib64/libpthread.so.0(+0x7e25)[0x2b9ce50fae25]
[cl1n022:06306] [ 8] /lib64/libc.so.6(clone+0x6d)[0x2b9ce5f67bad]
[cl1n022:06306] *** End of error message ***
SIGTERM received
SIGTERM received


As mentioned in older posts, I've tried adding the LD_PRELOAD variable, or running mpirun with the "-mca btl ^openib" option, but it didn't help.

As this happens with the Bioconda package, I guess it should be pretty reproducible on other setups.

Bioconda's Maker package uses version 5.26.2 of Perl and version 3.1.2 of OpenMPI, and the OpenMPI recipe is on https://github.com/conda-forge/openmpi-feedstock/tree/master/recipe

Any help would be highly appreciated!

Anthony Bretaudeau

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Segfault with OpenMPI

Anthony Bretaudeau-2

Hi,

I think I finally found a solution for this segfault. In short: run "export THREADS_DAEMON_MODEL=1" before running maker.

After looking at the debug log, I noticed that the segfault happened the first time the perl system() function was called (usually to launch a "mv" command).

This + the backtrace shows that it has something to do with signal handling when running child process from threads.

After a lot of trials and errors modifying the code, I found this page talking about this env var: https://metacpan.org/pod/forks#Co-existance-with-fork-aware-modules-and-environments

It seems to be enough to avoid the segfault. I have no idea if it could have any downside, but maker seems to give the same results as in non-mpi mode.


Concerning RepeatMasker not being installed correctly, it seems to be intended as written in the RepeatMasker conda recipe: https://github.com/bioconda/bioconda-recipes/blob/master/recipes/repeatmasker/build.sh#L16

I use the REPEATMASKER_LIB_DIR env var so it's not really a problem for me, and the galaxy tools is doing the same (https://github.com/galaxyproject/tools-iuc/blob/master/tools/maker/maker.xml#L11).

I'm not a RepeatMasker expert, so I don't know if providing the old database would make more sense...

I guess it's the same question for te_proteins.


Cheers

Anthony



Le 05/10/2018 à 22:37, Carson Holt a écrit :
I tried setting this up but there are a number of issues I run into.

First RepeatMasker is not being installed correctly. The configuration step should create these files (created by ./configure script during RepeatMasker setup) —>
RepeatMasker.lib
RepeatMasker.lib.nhr
RepeatMasker.lib.nin
RepeatMasker.lib.nsq
RepeatMaskerLib.embl

But they do not exist in the share directory.

Also MAKER needs access to the te_proteins file in …/maker/data, and because you have rearranged maker’s structure it can’t find it.


Then for the Segmentation fault, I have seen this a handful of times in the past where users install their own version of perl rather than using the system perl together with their own install of OpenMPI. The issue is some series of flags either in OpenMPi or perl (I’m not sure which). But one way around it is to disable the interpreter threads option when compiling and installing perl for yourself. Most system perl installs have interpreter threads enabled, so I’m not sure why some self-installs generate this segfault and never the system perl. Interestingly interpreter threads are turned off by default when you install perl manually as they are “officially discouraged". You actually have to enable it during the self-install process, and conda is enabling them on the manual install to match most system perls.

Another work around is don’t use OpenMPI. Try MPICH3.


—Carson





On Sep 25, 2018, at 6:10 AM, Anthony Bretaudeau <[hidden email]> wrote:

Hi,

I've worked on the Bioconda recipe for Maker (https://github.com/bioconda/bioconda-recipes/tree/master/recipes/maker/). It works well, except when using it in MPI mode. I get this segfault error:

STATUS: Processing and indexing input FASTA files...
[cl1n022:06306] *** Process received signal ***
[cl1n022:06306] Signal: Segmentation fault (11)
[cl1n022:06306] Signal code: Address not mapped (1)
[cl1n022:06306] Failing at address: 0x514
[cl1n022:06306] [ 0] /lib64/libpthread.so.0(+0xf6d0)[0x2b9ce51026d0]
[cl1n022:06306] [ 1] /local/miniconda3/envs/maker-2.31.10/bin/perl(Perl_csighandler+0x1e)[0x4aad4e]
[cl1n022:06306] [ 2] /lib64/libpthread.so.0(+0xf6d0)[0x2b9ce51026d0]
[cl1n022:06306] [ 3] /lib64/libc.so.6(__poll+0x2d)[0x2b9ce5f5cf0d]
[cl1n022:06306] [ 4] /local/miniconda3/envs/maker-2.31.10/perl/lib/auto/Parallel/Application/MPI/../../../../../../lib/./libopen-pal.so.40(+0x869e5)[0x2b9cf05859e5]
[cl1n022:06306] [ 5] /local/miniconda3/envs/maker-2.31.10/perl/lib/auto/Parallel/Application/MPI/../../../../../../lib/./libopen-pal.so.40(opal_libevent2022_event_base_loop+0x242)[0x2b9cf057a73a]
[cl1n022:06306] [ 6] /local/miniconda3/envs/maker-2.31.10/perl/lib/auto/Parallel/Application/MPI/../../../../../../lib/./libopen-pal.so.40(+0x384de)[0x2b9cf05374de]
[cl1n022:06306] [ 7] /lib64/libpthread.so.0(+0x7e25)[0x2b9ce50fae25]
[cl1n022:06306] [ 8] /lib64/libc.so.6(clone+0x6d)[0x2b9ce5f67bad]
[cl1n022:06306] *** End of error message ***
SIGTERM received
SIGTERM received


As mentioned in older posts, I've tried adding the LD_PRELOAD variable, or running mpirun with the "-mca btl ^openib" option, but it didn't help.

As this happens with the Bioconda package, I guess it should be pretty reproducible on other setups.

Bioconda's Maker package uses version 5.26.2 of Perl and version 3.1.2 of OpenMPI, and the OpenMPI recipe is on https://github.com/conda-forge/openmpi-feedstock/tree/master/recipe

Any help would be highly appreciated!

Anthony Bretaudeau

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Segfault with OpenMPI

Carson Holt-2
Repeatmasker does some data prep during installation (creates new files in the process), and that does not happeni for the bioconda RepeatMasker recipe. So it’s broken.

For fixing it, look at the homebrew recipe for RepeatMasker. It does a good job where they also have it preconfigure itself for the free Dfam database rather than RepBase light —>

https://github.com/brewsci/homebrew-bio/blob/master/Formula/repeatmasker.rb

te_proteins is not a RepeatMasker file. It’s a RepeatRunner file which has been integrated into MAKER. MAKER just needs to be able to find it. It will look in the …/maker/data/ directory by default and put the location in te_protein= by default.

—Carson




On Oct 18, 2018, at 7:52 AM, Anthony Bretaudeau <[hidden email]> wrote:

Hi,

I think I finally found a solution for this segfault. In short: run "export THREADS_DAEMON_MODEL=1" before running maker.

After looking at the debug log, I noticed that the segfault happened the first time the perl system() function was called (usually to launch a "mv" command).

This + the backtrace shows that it has something to do with signal handling when running child process from threads.

After a lot of trials and errors modifying the code, I found this page talking about this env var: https://metacpan.org/pod/forks#Co-existance-with-fork-aware-modules-and-environments

It seems to be enough to avoid the segfault. I have no idea if it could have any downside, but maker seems to give the same results as in non-mpi mode.


Concerning RepeatMasker not being installed correctly, it seems to be intended as written in the RepeatMasker conda recipe: https://github.com/bioconda/bioconda-recipes/blob/master/recipes/repeatmasker/build.sh#L16

I use the REPEATMASKER_LIB_DIR env var so it's not really a problem for me, and the galaxy tools is doing the same (https://github.com/galaxyproject/tools-iuc/blob/master/tools/maker/maker.xml#L11).

I'm not a RepeatMasker expert, so I don't know if providing the old database would make more sense...

I guess it's the same question for te_proteins.


Cheers

Anthony



Le 05/10/2018 à 22:37, Carson Holt a écrit :
I tried setting this up but there are a number of issues I run into.

First RepeatMasker is not being installed correctly. The configuration step should create these files (created by ./configure script during RepeatMasker setup) —>
RepeatMasker.lib
RepeatMasker.lib.nhr
RepeatMasker.lib.nin
RepeatMasker.lib.nsq
RepeatMaskerLib.embl

But they do not exist in the share directory.

Also MAKER needs access to the te_proteins file in …/maker/data, and because you have rearranged maker’s structure it can’t find it.


Then for the Segmentation fault, I have seen this a handful of times in the past where users install their own version of perl rather than using the system perl together with their own install of OpenMPI. The issue is some series of flags either in OpenMPi or perl (I’m not sure which). But one way around it is to disable the interpreter threads option when compiling and installing perl for yourself. Most system perl installs have interpreter threads enabled, so I’m not sure why some self-installs generate this segfault and never the system perl. Interestingly interpreter threads are turned off by default when you install perl manually as they are “officially discouraged". You actually have to enable it during the self-install process, and conda is enabling them on the manual install to match most system perls.

Another work around is don’t use OpenMPI. Try MPICH3.


—Carson





On Sep 25, 2018, at 6:10 AM, Anthony Bretaudeau <[hidden email]> wrote:

Hi,

I've worked on the Bioconda recipe for Maker (https://github.com/bioconda/bioconda-recipes/tree/master/recipes/maker/). It works well, except when using it in MPI mode. I get this segfault error:

STATUS: Processing and indexing input FASTA files...
[cl1n022:06306] *** Process received signal ***
[cl1n022:06306] Signal: Segmentation fault (11)
[cl1n022:06306] Signal code: Address not mapped (1)
[cl1n022:06306] Failing at address: 0x514
[cl1n022:06306] [ 0] /lib64/libpthread.so.0(+0xf6d0)[0x2b9ce51026d0]
[cl1n022:06306] [ 1] /local/miniconda3/envs/maker-2.31.10/bin/perl(Perl_csighandler+0x1e)[0x4aad4e]
[cl1n022:06306] [ 2] /lib64/libpthread.so.0(+0xf6d0)[0x2b9ce51026d0]
[cl1n022:06306] [ 3] /lib64/libc.so.6(__poll+0x2d)[0x2b9ce5f5cf0d]
[cl1n022:06306] [ 4] /local/miniconda3/envs/maker-2.31.10/perl/lib/auto/Parallel/Application/MPI/../../../../../../lib/./libopen-pal.so.40(+0x869e5)[0x2b9cf05859e5]
[cl1n022:06306] [ 5] /local/miniconda3/envs/maker-2.31.10/perl/lib/auto/Parallel/Application/MPI/../../../../../../lib/./libopen-pal.so.40(opal_libevent2022_event_base_loop+0x242)[0x2b9cf057a73a]
[cl1n022:06306] [ 6] /local/miniconda3/envs/maker-2.31.10/perl/lib/auto/Parallel/Application/MPI/../../../../../../lib/./libopen-pal.so.40(+0x384de)[0x2b9cf05374de]
[cl1n022:06306] [ 7] /lib64/libpthread.so.0(+0x7e25)[0x2b9ce50fae25]
[cl1n022:06306] [ 8] /lib64/libc.so.6(clone+0x6d)[0x2b9ce5f67bad]
[cl1n022:06306] *** End of error message ***
SIGTERM received
SIGTERM received


As mentioned in older posts, I've tried adding the LD_PRELOAD variable, or running mpirun with the "-mca btl ^openib" option, but it didn't help.

As this happens with the Bioconda package, I guess it should be pretty reproducible on other setups.

Bioconda's Maker package uses version 5.26.2 of Perl and version 3.1.2 of OpenMPI, and the OpenMPI recipe is on https://github.com/conda-forge/openmpi-feedstock/tree/master/recipe

Any help would be highly appreciated!

Anthony Bretaudeau

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org