pipeline freezes in running state whereas no job is running

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

pipeline freezes in running state whereas no job is running

Gérald Salin
Hi all,
we observed a strange behaviour on our instance of ergatis : a component which is in the running state in the pipeline view (http://genomique.genotoul.fr/tmp/ergatis_general.jpg) is in fact not running at all. Looking in the detail view (http://genomique.genotoul.fr/tmp/ergatis_detail.jpg), we can see that all the jobs have finished except the last one that is incomplete...but no one is running. I checked on our cluster : no sge job corresponding to this pipeline is running. A step is finished, but the newt one never begins (this component normally ends in less than an hour)
The pipeline.xml.log keeps on growing (no ERROR nor FATAL logs in it). The pipeline.xml.run.out contains warnings about the event.log.monitoring file (see below)

We can observe this behaviour for different pipelines, working on different data, at different steps

Is that event.log.monitoring file can be the reason of our problems?
what can be the reason of this "freeze"?
Workflow id are mysql-based and pipeline id are file-based

thank you for your help

Gérald

pipeline.xml.run.out
WARN 15:14:04:512 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:14:14:953 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:14:18:341 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1332 Could not delete the file event.log.monitoring.  This may halt the event log file monitoring. Delete it manually.
WARN 15:14:31:477 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:14:34:745 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:15:04:534 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:15:05:699 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1332 Could not delete the file event.log.monitoring.  This may halt the event log file monitoring. Delete it manually.
WARN 15:15:12:181 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:15:23:797 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1332 Could not delete the file event.log.monitoring.  This may halt the event log file monitoring. Delete it manually.
WARN 15:15:33:683 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:15:53:349 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:16:02:300 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:16:38:273 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1332 Could not delete the file event.log.monitoring.  This may halt the event log file monitoring. Delete it manually.
WARN 15:16:49:322 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.



-- 
Gérald Salin
Informatique - Plateforme Génomique
Génopole Toulouse Midi-Pyrénées
Tél : 05.61.28.55.90
Fax : 05.61.28.55.93
web : http://genomique.genotoul.fr 

------------------------------------------------------------------------------
What happens now with your Lotus Notes apps - do you make another costly
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
_______________________________________________
Ergatis-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/ergatis-users
Reply | Threaded
Open this post in threaded view
|

Re: pipeline freezes in running state whereas no job is running

Mahurkar, Anup

Gerald,

 

What version of workflow are you running? Looking at the log messages it appears that there was a problem creating the event.log.monitoring file which is how workflow checks for job completion. If there is a problem with this file occasionally then Workflow thinks that the job never finished. Could you send me the event.log file for that particular command? If you look at the XML file for that command it should have the path to the event.log file

 

From: Gérald Salin [mailto:[hidden email]]
Sent: Wednesday, December 08, 2010 9:21 AM
To: [hidden email]
Subject: [Ergatis-users] pipeline freezes in running state whereas no job is running

 

Hi all,
we observed a strange behaviour on our instance of ergatis : a component which is in the running state in the pipeline view (http://genomique.genotoul.fr/tmp/ergatis_general.jpg) is in fact not running at all. Looking in the detail view (http://genomique.genotoul.fr/tmp/ergatis_detail.jpg), we can see that all the jobs have finished except the last one that is incomplete...but no one is running. I checked on our cluster : no sge job corresponding to this pipeline is running. A step is finished, but the newt one never begins (this component normally ends in less than an hour)
The pipeline.xml.log keeps on growing (no ERROR nor FATAL logs in it). The pipeline.xml.run.out contains warnings about the event.log.monitoring file (see below)

We can observe this behaviour for different pipelines, working on different data, at different steps

Is that event.log.monitoring file can be the reason of our problems?
what can be the reason of this "freeze"?
Workflow id are mysql-based and pipeline id are file-based

thank you for your help

Gérald

pipeline.xml.run.out
WARN 15:14:04:512 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:14:14:953 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:14:18:341 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1332 Could not delete the file event.log.monitoring.  This may halt the event log file monitoring. Delete it manually.
WARN 15:14:31:477 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:14:34:745 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:15:04:534 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:15:05:699 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1332 Could not delete the file event.log.monitoring.  This may halt the event log file monitoring. Delete it manually.
WARN 15:15:12:181 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:15:23:797 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1332 Could not delete the file event.log.monitoring.  This may halt the event log file monitoring. Delete it manually.
WARN 15:15:33:683 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:15:53:349 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:16:02:300 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:16:38:273 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1332 Could not delete the file event.log.monitoring.  This may halt the event log file monitoring. Delete it manually.
WARN 15:16:49:322 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.



-- 
Gérald Salin
Informatique - Plateforme Génomique
Génopole Toulouse Midi-Pyrénées
Tél : 05.61.28.55.90
Fax : 05.61.28.55.93
web : http://genomique.genotoul.fr 

------------------------------------------------------------------------------
What happens now with your Lotus Notes apps - do you make another costly
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
_______________________________________________
Ergatis-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/ergatis-users
Reply | Threaded
Open this post in threaded view
|

Re: pipeline freezes in running state whereas no job is running

Gérald Salin
In reply to this post by Gérald Salin
Thank you for your help.
the version of workflow is wf-3.1.2.jar
the log file corresponding to the command is http://vm-bioinfo.toulouse.inra.fr/ergatis/cgi/view_formatted_log_source.cgi?file=/work/tmp/working/6193152/6197392/event.log

Another information : the server hosting the ergatis instance runs locally the parent jobs of the pipelines. Yesterday, there were almost 15 pipelines running in parallel, which caused the load average of the server to be quite high.
May it be the origin of our problem?

we will try to configure ergatis (submit_pipelines_as_jobs = 1) so that it executes the parent jobs on the cluster 

Gérald
Gerald,

What version of workflow are you running? Looking at the log messages it appears that there was a problem creating the event.log.monitoring file which is how workflow checks for job completion. If there is a problem with this file occasionally then Workflow thinks that the job never finished. Could you send me the event.log file for that particular command? If you look at the XML file for that command it should have the path to the event.log file

From: Gérald Salin [mailto:[hidden email]]
Sent: Wednesday, December 08, 2010 9:21 AM
To: [hidden email]
Subject: [Ergatis-users] pipeline freezes in running state whereas no job is running

Hi all,
we observed a strange behaviour on our instance of ergatis : a component which is in the running state in the pipeline view (http://genomique.genotoul.fr/tmp/ergatis_general.jpg) is in fact not running at all. Looking in the detail view (http://genomique.genotoul.fr/tmp/ergatis_detail.jpg), we can see that all the jobs have finished except the last one that is incomplete...but no one is running. I checked on our cluster : no sge job corresponding to this pipeline is running. A step is finished, but the newt one never begins (this component normally ends in less than an hour)
The pipeline.xml.log keeps on growing (no ERROR nor FATAL logs in it). The pipeline.xml.run.out contains warnings about the event.log.monitoring file (see below)

We can observe this behaviour for different pipelines, working on different data, at different steps

Is that event.log.monitoring file can be the reason of our problems?
what can be the reason of this "freeze"?
Workflow id are mysql-based and pipeline id are file-based

thank you for your help

Gérald

pipeline.xml.run.out
WARN 15:14:04:512 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:14:14:953 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:14:18:341 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1332 Could not delete the file event.log.monitoring.  This may halt the event log file monitoring. Delete it manually.
WARN 15:14:31:477 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:14:34:745 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:15:04:534 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:15:05:699 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1332 Could not delete the file event.log.monitoring.  This may halt the event log file monitoring. Delete it manually.
WARN 15:15:12:181 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:15:23:797 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1332 Could not delete the file event.log.monitoring.  This may halt the event log file monitoring. Delete it manually.
WARN 15:15:33:683 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:15:53:349 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:16:02:300 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.
WARN 15:16:38:273 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1332 Could not delete the file event.log.monitoring.  This may halt the event log file monitoring. Delete it manually.
WARN 15:16:49:322 [Thread: (0) Monitor Command 6197392] SGERunner monitorForCompletion:1205 Failed creating file event.log.monitoring.


<http://genomique.genotoul.fr/tmp/ergatis_detail.jpg>

-- 
Gérald Salin
Informatique - Plateforme Génomique
Génopole Toulouse Midi-Pyrénées
Tél : 05.61.28.55.90
Fax : 05.61.28.55.93
web : http://genomique.genotoul.fr - http://get.genotoul.fr

------------------------------------------------------------------------------
This SF Dev2Dev email is sponsored by:

WikiLeaks The End of the Free Internet
http://p.sf.net/sfu/therealnews-com
_______________________________________________
Ergatis-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/ergatis-users