Maker-Error when started with OpenMPI

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Maker-Error when started with OpenMPI

Rainer Rutka
Hi everybody.

My name is Rainer. I am an administrator for our HPC-Systems at our
university in Konstanz, Baden-Wuertemberg/Germany.
The procect is called bwHPC-C5.

See: https://www.bwhpc-c5.de/en/index.php

I try to get Maker running on our bwUniCluster since weeks. Unfortunately
i get errors while running a Maker job in the MPI-environment.

BUILD STATUS

==============================================================================
STATUS MAKER v2.31.9
==============================================================================
PERL Dependencies: VERIFIED
External Programs: VERIFIED
External C Libraries: VERIFIED
MPI SUPPORT: ENABLED
MWAS Web Interface: DISABLED
MAKER PACKAGE: CONFIGURATION OK

MODULES / INCLUDES / COMPILERS

# knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9
#
##### (B) Dependencies:
#
# conflict: any other maker version
# module load compiler/gnu/5.2
# module load mpi/openmpi/2.0-gnu-5.2
[...]

MPI/MOAB SUBMIT

[...]
### Queues ###
#MSUB -q fat
#MSUB -l nodes=1:ppn=16
#MSUB -l mem=20gb
#MSUB -l walltime=50:00:00
#
[...]
echo " "
echo "### Loading MAKER module:"
echo " "
module load bio/maker/2.31.9
[ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module
'bio/maker/2.31.9'."; exit 1; }
echo "MAKER_VERSION = $MAKER_VERSION"
module list
[...]
echo " "
echo "### Runing Maker example"
echo " "
export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so
export OMPI_MCA_mpi_warn_on_fork=0

echo "LD_PRELOAD=${LD_PRELOAD}"
#
# "STATUS: Processing and indexing input FASTA files..."
#
mpiexec -mca btl ^openib -n 16 maker
[...]


E R R O R S
=======
[...]
LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so
STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
[uc1n338:113607] *** Process received signal ***
[uc1n338:113607] Signal: Segmentation fault (11)
[uc1n338:113607] Signal code: Address not mapped (1)
[uc1n338:113607] Failing at address: 0x4b0
[uc1n338:113608] *** Process received signal ***
[uc1n338:113608] Signal: Segmentation fault (11)
[uc1n338:113608] Signal code: Address not mapped (1)
[uc1n338:113608] Failing at address: 0x4b0
[uc1n338:113621] *** Process received signal ***
[uc1n338:113621] Signal: Segmentation fault (11)
[uc1n338:113621] Signal code: Address not mapped (1)
[uc1n338:113621] Failing at address: 0x4b0
--------------------------------------------------------------------------
mpiexec noticed that process rank 2 with PID 113608 on node uc1n338
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[...]

WHATS WRONG HERE!?

Thank you for your help!

All the best ,

Rainer

--
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
* High-Performance-Computing (HPC)
* KIM-Support and -Base-Services
Room: V511
78457 Konstanz, Germany
+49 7531 88-5413


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Maker-Error when started with OpenMPI

Carson Holt-2
Try adding one of the following to your mpiexec command —>

1. --mca btl ^openib
2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0

One or the other may fix your issue.  The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter.

--Carson


> On Jan 27, 2017, at 3:31 AM, Rainer Rutka <[hidden email]> wrote:
>
> Hi everybody.
>
> My name is Rainer. I am an administrator for our HPC-Systems at our
> university in Konstanz, Baden-Wuertemberg/Germany.
> The procect is called bwHPC-C5.
>
> See: https://www.bwhpc-c5.de/en/index.php
>
> I try to get Maker running on our bwUniCluster since weeks. Unfortunately
> i get errors while running a Maker job in the MPI-environment.
>
> BUILD STATUS
>
> ==============================================================================
> STATUS MAKER v2.31.9
> ==============================================================================
> PERL Dependencies: VERIFIED
> External Programs: VERIFIED
> External C Libraries: VERIFIED
> MPI SUPPORT: ENABLED
> MWAS Web Interface: DISABLED
> MAKER PACKAGE: CONFIGURATION OK
>
> MODULES / INCLUDES / COMPILERS
>
> # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9
> #
> ##### (B) Dependencies:
> #
> # conflict: any other maker version
> # module load compiler/gnu/5.2
> # module load mpi/openmpi/2.0-gnu-5.2
> [...]
>
> MPI/MOAB SUBMIT
>
> [...]
> ### Queues ###
> #MSUB -q fat
> #MSUB -l nodes=1:ppn=16
> #MSUB -l mem=20gb
> #MSUB -l walltime=50:00:00
> #
> [...]
> echo " "
> echo "### Loading MAKER module:"
> echo " "
> module load bio/maker/2.31.9
> [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; }
> echo "MAKER_VERSION = $MAKER_VERSION"
> module list
> [...]
> echo " "
> echo "### Runing Maker example"
> echo " "
> export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so
> export OMPI_MCA_mpi_warn_on_fork=0
>
> echo "LD_PRELOAD=${LD_PRELOAD}"
> #
> # "STATUS: Processing and indexing input FASTA files..."
> #
> mpiexec -mca btl ^openib -n 16 maker
> [...]
>
>
> E R R O R S
> =======
> [...]
> LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> [uc1n338:113607] *** Process received signal ***
> [uc1n338:113607] Signal: Segmentation fault (11)
> [uc1n338:113607] Signal code: Address not mapped (1)
> [uc1n338:113607] Failing at address: 0x4b0
> [uc1n338:113608] *** Process received signal ***
> [uc1n338:113608] Signal: Segmentation fault (11)
> [uc1n338:113608] Signal code: Address not mapped (1)
> [uc1n338:113608] Failing at address: 0x4b0
> [uc1n338:113621] *** Process received signal ***
> [uc1n338:113621] Signal: Segmentation fault (11)
> [uc1n338:113621] Signal code: Address not mapped (1)
> [uc1n338:113621] Failing at address: 0x4b0
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> [...]
>
> WHATS WRONG HERE!?
>
> Thank you for your help!
>
> All the best ,
>
> Rainer
>
> --
> Rainer Rutka
> University of Konstanz
> Communication, Information, Media Centre (KIM)
> * High-Performance-Computing (HPC)
> * KIM-Support and -Base-Services
> Room: V511
> 78457 Konstanz, Germany
> +49 7531 88-5413
>
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker-Error when started with OpenMPI

Rainer Rutka
Hi!

Unfortunately all of the options failed on our cluster.

See:

Hi,

Most recent Maker test with
--mca btl vader,tcp,self --mca btl_tcp_if_include eth0
Error:
--> rank=2, hostname=uc1n518.localdomain
[uc1n518:67009] *** Process received signal ***
[uc1n518:67009] Signal: Segmentation fault (11)
[uc1n518:67009] Signal code: Address not mapped (1)
[uc1n518:67009] Failing at address: 0x4b0
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 67009 on node uc1n518
exited on signal 11 (Segmentation fault).


With:
--mca btl ^openib
and also this
--mca btl vader,tcp,self --mca btl_tcp_if_include ib0
Error:
### Runing Maker example

STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
[uc1n514:59985] *** Process received signal ***
[uc1n514:59985] Signal: Segmentation fault (11)
[uc1n514:59985] Signal code: Address not mapped (1)
[uc1n514:59985] Failing at address: 0x4b0
--------------------------------------------------------------------------
mpiexec noticed that process rank 10 with PID 59985 on node uc1n514
exited on signal 11 (Segmentation fault).


Am 28.01.2017 um 21:53 schrieb Carson Holt:

> Try adding one of the following to your mpiexec command —>
>
> 1. --mca btl ^openib
> 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
> 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0
>
> One or the other may fix your issue.  The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter.
>
> --Carson
>
>
>> On Jan 27, 2017, at 3:31 AM, Rainer Rutka <[hidden email]> wrote:
>>
>> Hi everybody.
>>
>> My name is Rainer. I am an administrator for our HPC-Systems at our
>> university in Konstanz, Baden-Wuertemberg/Germany.
>> The procect is called bwHPC-C5.
>>
>> See: https://www.bwhpc-c5.de/en/index.php
>>
>> I try to get Maker running on our bwUniCluster since weeks. Unfortunately
>> i get errors while running a Maker job in the MPI-environment.
>>
>> BUILD STATUS
>>
>> ==============================================================================
>> STATUS MAKER v2.31.9
>> ==============================================================================
>> PERL Dependencies: VERIFIED
>> External Programs: VERIFIED
>> External C Libraries: VERIFIED
>> MPI SUPPORT: ENABLED
>> MWAS Web Interface: DISABLED
>> MAKER PACKAGE: CONFIGURATION OK
>>
>> MODULES / INCLUDES / COMPILERS
>>
>> # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9
>> #
>> ##### (B) Dependencies:
>> #
>> # conflict: any other maker version
>> # module load compiler/gnu/5.2
>> # module load mpi/openmpi/2.0-gnu-5.2
>> [...]
>>
>> MPI/MOAB SUBMIT
>>
>> [...]
>> ### Queues ###
>> #MSUB -q fat
>> #MSUB -l nodes=1:ppn=16
>> #MSUB -l mem=20gb
>> #MSUB -l walltime=50:00:00
>> #
>> [...]
>> echo " "
>> echo "### Loading MAKER module:"
>> echo " "
>> module load bio/maker/2.31.9
>> [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; }
>> echo "MAKER_VERSION = $MAKER_VERSION"
>> module list
>> [...]
>> echo " "
>> echo "### Runing Maker example"
>> echo " "
>> export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so
>> export OMPI_MCA_mpi_warn_on_fork=0
>>
>> echo "LD_PRELOAD=${LD_PRELOAD}"
>> #
>> # "STATUS: Processing and indexing input FASTA files..."
>> #
>> mpiexec -mca btl ^openib -n 16 maker
>> [...]
>>
>>
>> E R R O R S
>> =======
>> [...]
>> LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so
>> STATUS: Parsing control files...
>> STATUS: Processing and indexing input FASTA files...
>> [uc1n338:113607] *** Process received signal ***
>> [uc1n338:113607] Signal: Segmentation fault (11)
>> [uc1n338:113607] Signal code: Address not mapped (1)
>> [uc1n338:113607] Failing at address: 0x4b0
>> [uc1n338:113608] *** Process received signal ***
>> [uc1n338:113608] Signal: Segmentation fault (11)
>> [uc1n338:113608] Signal code: Address not mapped (1)
>> [uc1n338:113608] Failing at address: 0x4b0
>> [uc1n338:113621] *** Process received signal ***
>> [uc1n338:113621] Signal: Segmentation fault (11)
>> [uc1n338:113621] Signal code: Address not mapped (1)
>> [uc1n338:113621] Failing at address: 0x4b0
>> --------------------------------------------------------------------------
>> mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>> [...]
>>
>> WHATS WRONG HERE!?
>>
>> Thank you for your help!
>>
>> All the best ,
>>
>> Rainer
>>
>> --
>> Rainer Rutka
>> University of Konstanz
>> Communication, Information, Media Centre (KIM)
>> * High-Performance-Computing (HPC)
>> * KIM-Support and -Base-Services
>> Room: V511
>> 78457 Konstanz, Germany
>> +49 7531 88-5413
>>
>> _______________________________________________
>> maker-devel mailing list
>> [hidden email]
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
--
Rainer Rutka
Universität Konstanz
Kommunikations-, Informations-, Medienzentrum (KIM)
Raum: V511, Tel: 54 13


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Maker-Error when started with OpenMPI

Carson Holt-2
Try running just on a single node (not across nodes). If it still fails, you might need to try installing an updated OpenMPI version then reinstalling and running MAKER with that new version. You can install it in your home directory and test from there, just make sure to add it to your path.

Alternatively MPICH3 and IntelMPI (with some extra configuration for IntelMPI) can be used. If you decide to try Intel MPI let me know, and I can provide you with the info on configuration.

—Carson



> On Feb 16, 2017, at 3:44 AM, Rainer Rutka <[hidden email]> wrote:
>
> Hi!
>
> Unfortunately all of the options failed on our cluster.
>
> See:
>
> Hi,
>
> Most recent Maker test with
> --mca btl vader,tcp,self --mca btl_tcp_if_include eth0
> Error:
> --> rank=2, hostname=uc1n518.localdomain
> [uc1n518:67009] *** Process received signal ***
> [uc1n518:67009] Signal: Segmentation fault (11)
> [uc1n518:67009] Signal code: Address not mapped (1)
> [uc1n518:67009] Failing at address: 0x4b0
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 1 with PID 67009 on node uc1n518 exited on signal 11 (Segmentation fault).
>
>
> With:
> --mca btl ^openib
> and also this
> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
> Error:
> ### Runing Maker example
>
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> [uc1n514:59985] *** Process received signal ***
> [uc1n514:59985] Signal: Segmentation fault (11)
> [uc1n514:59985] Signal code: Address not mapped (1)
> [uc1n514:59985] Failing at address: 0x4b0
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 10 with PID 59985 on node uc1n514 exited on signal 11 (Segmentation fault).
>
>
> Am 28.01.2017 um 21:53 schrieb Carson Holt:
>> Try adding one of the following to your mpiexec command —>
>>
>> 1. --mca btl ^openib
>> 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
>> 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0
>>
>> One or the other may fix your issue.  The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter.
>>
>> --Carson
>>
>>
>>> On Jan 27, 2017, at 3:31 AM, Rainer Rutka <[hidden email]> wrote:
>>>
>>> Hi everybody.
>>>
>>> My name is Rainer. I am an administrator for our HPC-Systems at our
>>> university in Konstanz, Baden-Wuertemberg/Germany.
>>> The procect is called bwHPC-C5.
>>>
>>> See: https://www.bwhpc-c5.de/en/index.php
>>>
>>> I try to get Maker running on our bwUniCluster since weeks. Unfortunately
>>> i get errors while running a Maker job in the MPI-environment.
>>>
>>> BUILD STATUS
>>>
>>> ==============================================================================
>>> STATUS MAKER v2.31.9
>>> ==============================================================================
>>> PERL Dependencies: VERIFIED
>>> External Programs: VERIFIED
>>> External C Libraries: VERIFIED
>>> MPI SUPPORT: ENABLED
>>> MWAS Web Interface: DISABLED
>>> MAKER PACKAGE: CONFIGURATION OK
>>>
>>> MODULES / INCLUDES / COMPILERS
>>>
>>> # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9
>>> #
>>> ##### (B) Dependencies:
>>> #
>>> # conflict: any other maker version
>>> # module load compiler/gnu/5.2
>>> # module load mpi/openmpi/2.0-gnu-5.2
>>> [...]
>>>
>>> MPI/MOAB SUBMIT
>>>
>>> [...]
>>> ### Queues ###
>>> #MSUB -q fat
>>> #MSUB -l nodes=1:ppn=16
>>> #MSUB -l mem=20gb
>>> #MSUB -l walltime=50:00:00
>>> #
>>> [...]
>>> echo " "
>>> echo "### Loading MAKER module:"
>>> echo " "
>>> module load bio/maker/2.31.9
>>> [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; }
>>> echo "MAKER_VERSION = $MAKER_VERSION"
>>> module list
>>> [...]
>>> echo " "
>>> echo "### Runing Maker example"
>>> echo " "
>>> export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so
>>> export OMPI_MCA_mpi_warn_on_fork=0
>>>
>>> echo "LD_PRELOAD=${LD_PRELOAD}"
>>> #
>>> # "STATUS: Processing and indexing input FASTA files..."
>>> #
>>> mpiexec -mca btl ^openib -n 16 maker
>>> [...]
>>>
>>>
>>> E R R O R S
>>> =======
>>> [...]
>>> LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so
>>> STATUS: Parsing control files...
>>> STATUS: Processing and indexing input FASTA files...
>>> [uc1n338:113607] *** Process received signal ***
>>> [uc1n338:113607] Signal: Segmentation fault (11)
>>> [uc1n338:113607] Signal code: Address not mapped (1)
>>> [uc1n338:113607] Failing at address: 0x4b0
>>> [uc1n338:113608] *** Process received signal ***
>>> [uc1n338:113608] Signal: Segmentation fault (11)
>>> [uc1n338:113608] Signal code: Address not mapped (1)
>>> [uc1n338:113608] Failing at address: 0x4b0
>>> [uc1n338:113621] *** Process received signal ***
>>> [uc1n338:113621] Signal: Segmentation fault (11)
>>> [uc1n338:113621] Signal code: Address not mapped (1)
>>> [uc1n338:113621] Failing at address: 0x4b0
>>> --------------------------------------------------------------------------
>>> mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault).
>>> --------------------------------------------------------------------------
>>> [...]
>>>
>>> WHATS WRONG HERE!?
>>>
>>> Thank you for your help!
>>>
>>> All the best ,
>>>
>>> Rainer
>>>
>>> --
>>> Rainer Rutka
>>> University of Konstanz
>>> Communication, Information, Media Centre (KIM)
>>> * High-Performance-Computing (HPC)
>>> * KIM-Support and -Base-Services
>>> Room: V511
>>> 78457 Konstanz, Germany
>>> +49 7531 88-5413
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> [hidden email]
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>
> --
> Rainer Rutka
> Universität Konstanz
> Kommunikations-, Informations-, Medienzentrum (KIM)
> Raum: V511, Tel: 54 13
>


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker-Error when started with OpenMPI

Rainer Rutka
@Robert Kraus: FYI

Am 20.02.2017 um 06:43 schrieb Carson Holt:
> Try running just on a single node (not across nodes).
THATS WHAT I DID.


> If it still fails, you might need to try installing an updated OpenMPI version then reinstalling and
> running MAKER with that new version. You can install it in your home directory and test from there,
> just make sure to add it to your path.
Shure it is.

> Alternatively MPICH3 and IntelMPI (with some extra configuration for IntelMPI) can be used.
 > If you decide to try Intel MPI let me know, and I can provide you with the info on configuration.

OK, send the infos please.

Thank you very much!

> —Carson
>
>
>
>> On Feb 16, 2017, at 3:44 AM, Rainer Rutka <[hidden email]> wrote:
>>
>> Hi!
>>
>> Unfortunately all of the options failed on our cluster.
>>
>> See:
>>
>> Hi,
>>
>> Most recent Maker test with
>> --mca btl vader,tcp,self --mca btl_tcp_if_include eth0
>> Error:
>> --> rank=2, hostname=uc1n518.localdomain
>> [uc1n518:67009] *** Process received signal ***
>> [uc1n518:67009] Signal: Segmentation fault (11)
>> [uc1n518:67009] Signal code: Address not mapped (1)
>> [uc1n518:67009] Failing at address: 0x4b0
>> --------------------------------------------------------------------------
>> mpiexec noticed that process rank 1 with PID 67009 on node uc1n518 exited on signal 11 (Segmentation fault).
>>
>>
>> With:
>> --mca btl ^openib
>> and also this
>> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
>> Error:
>> ### Runing Maker example
>>
>> STATUS: Parsing control files...
>> STATUS: Processing and indexing input FASTA files...
>> [uc1n514:59985] *** Process received signal ***
>> [uc1n514:59985] Signal: Segmentation fault (11)
>> [uc1n514:59985] Signal code: Address not mapped (1)
>> [uc1n514:59985] Failing at address: 0x4b0
>> --------------------------------------------------------------------------
>> mpiexec noticed that process rank 10 with PID 59985 on node uc1n514 exited on signal 11 (Segmentation fault).
>>
>>
>> Am 28.01.2017 um 21:53 schrieb Carson Holt:
>>> Try adding one of the following to your mpiexec command —>
>>>
>>> 1. --mca btl ^openib
>>> 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
>>> 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0
>>>
>>> One or the other may fix your issue.  The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter.
>>>
>>> --Carson
>>>
>>>
>>>> On Jan 27, 2017, at 3:31 AM, Rainer Rutka <[hidden email]> wrote:
>>>>
>>>> Hi everybody.
>>>>
>>>> My name is Rainer. I am an administrator for our HPC-Systems at our
>>>> university in Konstanz, Baden-Wuertemberg/Germany.
>>>> The procect is called bwHPC-C5.
>>>>
>>>> See: https://www.bwhpc-c5.de/en/index.php
>>>>
>>>> I try to get Maker running on our bwUniCluster since weeks. Unfortunately
>>>> i get errors while running a Maker job in the MPI-environment.
>>>>
>>>> BUILD STATUS
>>>>
>>>> ==============================================================================
>>>> STATUS MAKER v2.31.9
>>>> ==============================================================================
>>>> PERL Dependencies: VERIFIED
>>>> External Programs: VERIFIED
>>>> External C Libraries: VERIFIED
>>>> MPI SUPPORT: ENABLED
>>>> MWAS Web Interface: DISABLED
>>>> MAKER PACKAGE: CONFIGURATION OK
>>>>
>>>> MODULES / INCLUDES / COMPILERS
>>>>
>>>> # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9
>>>> #
>>>> ##### (B) Dependencies:
>>>> #
>>>> # conflict: any other maker version
>>>> # module load compiler/gnu/5.2
>>>> # module load mpi/openmpi/2.0-gnu-5.2
>>>> [...]
>>>>
>>>> MPI/MOAB SUBMIT
>>>>
>>>> [...]
>>>> ### Queues ###
>>>> #MSUB -q fat
>>>> #MSUB -l nodes=1:ppn=16
>>>> #MSUB -l mem=20gb
>>>> #MSUB -l walltime=50:00:00
>>>> #
>>>> [...]
>>>> echo " "
>>>> echo "### Loading MAKER module:"
>>>> echo " "
>>>> module load bio/maker/2.31.9
>>>> [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; }
>>>> echo "MAKER_VERSION = $MAKER_VERSION"
>>>> module list
>>>> [...]
>>>> echo " "
>>>> echo "### Runing Maker example"
>>>> echo " "
>>>> export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so
>>>> export OMPI_MCA_mpi_warn_on_fork=0
>>>>
>>>> echo "LD_PRELOAD=${LD_PRELOAD}"
>>>> #
>>>> # "STATUS: Processing and indexing input FASTA files..."
>>>> #
>>>> mpiexec -mca btl ^openib -n 16 maker
>>>> [...]
>>>>
>>>>
>>>> E R R O R S
>>>> =======
>>>> [...]
>>>> LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so
>>>> STATUS: Parsing control files...
>>>> STATUS: Processing and indexing input FASTA files...
>>>> [uc1n338:113607] *** Process received signal ***
>>>> [uc1n338:113607] Signal: Segmentation fault (11)
>>>> [uc1n338:113607] Signal code: Address not mapped (1)
>>>> [uc1n338:113607] Failing at address: 0x4b0
>>>> [uc1n338:113608] *** Process received signal ***
>>>> [uc1n338:113608] Signal: Segmentation fault (11)
>>>> [uc1n338:113608] Signal code: Address not mapped (1)
>>>> [uc1n338:113608] Failing at address: 0x4b0
>>>> [uc1n338:113621] *** Process received signal ***
>>>> [uc1n338:113621] Signal: Segmentation fault (11)
>>>> [uc1n338:113621] Signal code: Address not mapped (1)
>>>> [uc1n338:113621] Failing at address: 0x4b0
>>>> --------------------------------------------------------------------------
>>>> mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault).
>>>> --------------------------------------------------------------------------
>>>> [...]
>>>>
>>>> WHATS WRONG HERE!?
>>>>
>>>> Thank you for your help!
>>>>
>>>> All the best ,
>>>>
>>>> Rainer
>>>>
>>>> --
>>>> Rainer Rutka
>>>> University of Konstanz
>>>> Communication, Information, Media Centre (KIM)
>>>> * High-Performance-Computing (HPC)
>>>> * KIM-Support and -Base-Services
>>>> Room: V511
>>>> 78457 Konstanz, Germany
>>>> +49 7531 88-5413
>>>>
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> [hidden email]
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>
>> --
>> Rainer Rutka
>> Universität Konstanz
>> Kommunikations-, Informations-, Medienzentrum (KIM)
>> Raum: V511, Tel: 54 13
>>
>
--
Rainer Rutka
Universität Konstanz
Kommunikations-, Informations-, Medienzentrum (KIM)
Raum: V511, Tel: 54 13


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Maker-Error when started with OpenMPI

Carson Holt-2
If OpenMPI fails on a single node, it means you have a compilation issue, which indicates a problem with your installation. This sometimes happens if you compiled on one node and run on another (if could either be MAEKR or OpenMPI itself that was compiled on another node).


A few options you will need if trying intel MPI:
-binding pin=disable        #requires to disable processor affinity (otherwise MAKER calls to BLAST and other programs which are parallelized independent of MPI may not work)

Environmental variables to set:
export I_MPI_PIN_DOMAIN=node   #otherwise MAKER calls to BLAST and other programs which are parallelized independent of MPI may not work
export I_MPI_FABRICS='shm:tcp’   #avoid potential complication with OpenFabrics libraries (they block system calls because of how they use registered memory, i.e. MAKER calling BLAST would fail)
export I_MPI_HYDRA_IFACE=ib0  #set to eth0 if you don’t have an infiniband over ip inerface (required because of the above I_MPI_FABRICS change)



Also make sure to compile on the node you run. You can try expanding to other nodes after that.

—Carson

> On Feb 22, 2017, at 6:11 AM, Rainer Rutka <[hidden email]> wrote:
>
> @Robert Kraus: FYI
>
> Am 20.02.2017 um 06:43 schrieb Carson Holt:
>> Try running just on a single node (not across nodes).
> THATS WHAT I DID.
>
>
>> If it still fails, you might need to try installing an updated OpenMPI version then reinstalling and
>> running MAKER with that new version. You can install it in your home directory and test from there,
>> just make sure to add it to your path.
> Shure it is.
>
>> Alternatively MPICH3 and IntelMPI (with some extra configuration for IntelMPI) can be used.
> > If you decide to try Intel MPI let me know, and I can provide you with the info on configuration.
>
> OK, send the infos please.
>
> Thank you very much!
>
>> —Carson
>>
>>
>>
>>> On Feb 16, 2017, at 3:44 AM, Rainer Rutka <[hidden email]> wrote:
>>>
>>> Hi!
>>>
>>> Unfortunately all of the options failed on our cluster.
>>>
>>> See:
>>>
>>> Hi,
>>>
>>> Most recent Maker test with
>>> --mca btl vader,tcp,self --mca btl_tcp_if_include eth0
>>> Error:
>>> --> rank=2, hostname=uc1n518.localdomain
>>> [uc1n518:67009] *** Process received signal ***
>>> [uc1n518:67009] Signal: Segmentation fault (11)
>>> [uc1n518:67009] Signal code: Address not mapped (1)
>>> [uc1n518:67009] Failing at address: 0x4b0
>>> --------------------------------------------------------------------------
>>> mpiexec noticed that process rank 1 with PID 67009 on node uc1n518 exited on signal 11 (Segmentation fault).
>>>
>>>
>>> With:
>>> --mca btl ^openib
>>> and also this
>>> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
>>> Error:
>>> ### Runing Maker example
>>>
>>> STATUS: Parsing control files...
>>> STATUS: Processing and indexing input FASTA files...
>>> [uc1n514:59985] *** Process received signal ***
>>> [uc1n514:59985] Signal: Segmentation fault (11)
>>> [uc1n514:59985] Signal code: Address not mapped (1)
>>> [uc1n514:59985] Failing at address: 0x4b0
>>> --------------------------------------------------------------------------
>>> mpiexec noticed that process rank 10 with PID 59985 on node uc1n514 exited on signal 11 (Segmentation fault).
>>>
>>>
>>> Am 28.01.2017 um 21:53 schrieb Carson Holt:
>>>> Try adding one of the following to your mpiexec command —>
>>>>
>>>> 1. --mca btl ^openib
>>>> 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
>>>> 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0
>>>>
>>>> One or the other may fix your issue.  The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter.
>>>>
>>>> --Carson
>>>>
>>>>
>>>>> On Jan 27, 2017, at 3:31 AM, Rainer Rutka <[hidden email]> wrote:
>>>>>
>>>>> Hi everybody.
>>>>>
>>>>> My name is Rainer. I am an administrator for our HPC-Systems at our
>>>>> university in Konstanz, Baden-Wuertemberg/Germany.
>>>>> The procect is called bwHPC-C5.
>>>>>
>>>>> See: https://www.bwhpc-c5.de/en/index.php
>>>>>
>>>>> I try to get Maker running on our bwUniCluster since weeks. Unfortunately
>>>>> i get errors while running a Maker job in the MPI-environment.
>>>>>
>>>>> BUILD STATUS
>>>>>
>>>>> ==============================================================================
>>>>> STATUS MAKER v2.31.9
>>>>> ==============================================================================
>>>>> PERL Dependencies: VERIFIED
>>>>> External Programs: VERIFIED
>>>>> External C Libraries: VERIFIED
>>>>> MPI SUPPORT: ENABLED
>>>>> MWAS Web Interface: DISABLED
>>>>> MAKER PACKAGE: CONFIGURATION OK
>>>>>
>>>>> MODULES / INCLUDES / COMPILERS
>>>>>
>>>>> # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9
>>>>> #
>>>>> ##### (B) Dependencies:
>>>>> #
>>>>> # conflict: any other maker version
>>>>> # module load compiler/gnu/5.2
>>>>> # module load mpi/openmpi/2.0-gnu-5.2
>>>>> [...]
>>>>>
>>>>> MPI/MOAB SUBMIT
>>>>>
>>>>> [...]
>>>>> ### Queues ###
>>>>> #MSUB -q fat
>>>>> #MSUB -l nodes=1:ppn=16
>>>>> #MSUB -l mem=20gb
>>>>> #MSUB -l walltime=50:00:00
>>>>> #
>>>>> [...]
>>>>> echo " "
>>>>> echo "### Loading MAKER module:"
>>>>> echo " "
>>>>> module load bio/maker/2.31.9
>>>>> [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; }
>>>>> echo "MAKER_VERSION = $MAKER_VERSION"
>>>>> module list
>>>>> [...]
>>>>> echo " "
>>>>> echo "### Runing Maker example"
>>>>> echo " "
>>>>> export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so
>>>>> export OMPI_MCA_mpi_warn_on_fork=0
>>>>>
>>>>> echo "LD_PRELOAD=${LD_PRELOAD}"
>>>>> #
>>>>> # "STATUS: Processing and indexing input FASTA files..."
>>>>> #
>>>>> mpiexec -mca btl ^openib -n 16 maker
>>>>> [...]
>>>>>
>>>>>
>>>>> E R R O R S
>>>>> =======
>>>>> [...]
>>>>> LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so
>>>>> STATUS: Parsing control files...
>>>>> STATUS: Processing and indexing input FASTA files...
>>>>> [uc1n338:113607] *** Process received signal ***
>>>>> [uc1n338:113607] Signal: Segmentation fault (11)
>>>>> [uc1n338:113607] Signal code: Address not mapped (1)
>>>>> [uc1n338:113607] Failing at address: 0x4b0
>>>>> [uc1n338:113608] *** Process received signal ***
>>>>> [uc1n338:113608] Signal: Segmentation fault (11)
>>>>> [uc1n338:113608] Signal code: Address not mapped (1)
>>>>> [uc1n338:113608] Failing at address: 0x4b0
>>>>> [uc1n338:113621] *** Process received signal ***
>>>>> [uc1n338:113621] Signal: Segmentation fault (11)
>>>>> [uc1n338:113621] Signal code: Address not mapped (1)
>>>>> [uc1n338:113621] Failing at address: 0x4b0
>>>>> --------------------------------------------------------------------------
>>>>> mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault).
>>>>> --------------------------------------------------------------------------
>>>>> [...]
>>>>>
>>>>> WHATS WRONG HERE!?
>>>>>
>>>>> Thank you for your help!
>>>>>
>>>>> All the best ,
>>>>>
>>>>> Rainer
>>>>>
>>>>> --
>>>>> Rainer Rutka
>>>>> University of Konstanz
>>>>> Communication, Information, Media Centre (KIM)
>>>>> * High-Performance-Computing (HPC)
>>>>> * KIM-Support and -Base-Services
>>>>> Room: V511
>>>>> 78457 Konstanz, Germany
>>>>> +49 7531 88-5413
>>>>>
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> [hidden email]
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>
>>>
>>> --
>>> Rainer Rutka
>>> Universität Konstanz
>>> Kommunikations-, Informations-, Medienzentrum (KIM)
>>> Raum: V511, Tel: 54 13
>>>
>>
>
> --
> Rainer Rutka
> Universität Konstanz
> Kommunikations-, Informations-, Medienzentrum (KIM)
> Raum: V511, Tel: 54 13
>


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org