Processing contig output - MPI job fail

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Processing contig output - MPI job fail

lahcen campbell

REPOSTED UNDER CORRECTED SUBSCRIBED MEMBER EMAIL ACCOUNT.

Hello folks, 

Can anyone inform me on the ability of MAKER to restart from a checkpoint following annotation processing has compelted.

I had an MPI MAKER job running successfully for 6 weeks for a de novo fly genome I am working on. It was running with mpich3-3.1-icc on LSF batch system using 96 cpu's and 140Gb RAM. MAKER had processed 91% of the overall assembly length of my genome under MAKER_Finished contigs. Numbers of "Finished" contigs hadn't changed for ~10 days when it died, as I assume MAKER was collecting annotated gene stats, collecting contig statistics and clustering of transcripts into fasta files etc: (As follows)

............

clustering transcripts into genes for annotations
Processing transcripts into genes
adding statistics to annotations
Calculating annotation quality statistics
choosing best annotation set
Choosing best annotations
processing chunk output
processing contig output

However, this job exited after processing 26,767 of the 42,207 "MAKER Finished" contigs. The job died with a 255 exit code, which I suspect means somoene in our systems team may have killed the job to maintain system stability or someting.

The following error output was captured:

Calculating annotation quality statistics
choosing best annotation set
Choosing best annotations
processing chunk output
processing contig output

[[hidden email]] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[[hidden email]] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[[hidden email]] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:1@loom15] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:5@loom14] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:5@loom14] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[[hidden email]] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[[hidden email]] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[[hidden email]] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:5@loom14] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:1@loom15] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1@loom15] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[[hidden email]] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[[hidden email]] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[[hidden email]] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for completion
[[hidden email]] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion

Unfortunately since this process died, I have been unable to get the job reschduled again on our system due to resource limitations and job queing. But can anyone tell me, will MAKER be able to finish processing contig stats and information to completion, following this early exit ? I really can't afford another 6 weeks of computation so Im worried as you might expect. Would you recommend I submit this MAKER job again to finalize contig information/produce fasta files etc with the same amount of resources, or might I be able to request less resources without too much of a penalty in terms of compute time.

Any hints or insight on this would be greatly appreciated.

Thank you in advance,

Lahcen

EBI-Hinxton, UK. 


--
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: [hidden email]                        <
==========================================

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org