pbs/torque jobs

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

pbs/torque jobs

Briand, Sheldon (NRC/CNRC)

Hi,

 

I have upgraded to Galaxy 18.05 (from 17.05).  I am running a torque job scheduler (version 6.02). 

 

When I submit a job through galaxy and I look in the admin/manage jobs section I see that the job has been submitted successfully.  It shows that the job is queued and waiting to run.  On the cluster I see that the job runs and goes to completion.  However, the status in galaxy continue to show that the job is waiting to run and the status never gets updated.  I’m using postgres as my database and I am running through a nginx proxy.

 

I see no errors in my galaxy.log file.

 

I switched from galaxy.ini to galaxy.yml and from paste to uwsgi.  Is this a configuration problem on my end?  Where should I be looking?

 

Thanks,

-Sheldon

 

Sheldon Briand

Computer Systems and Applications Analyst

National Research Council/Government of Canada

[hidden email] Tel: (902) 426-1677

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: pbs/torque jobs

Marius van den Beek
Hi Sheldon,

is there anything particular about your job configuration, e.g.
are you you using the run as real user option or are you using
the drmaa_external_runjob_script option ?
Are you using the drmaa or the PBS runner ?
 
Best,
Marius

On 13 June 2018 at 21:13, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi,

 

I have upgraded to Galaxy 18.05 (from 17.05).  I am running a torque job scheduler (version 6.02). 

 

When I submit a job through galaxy and I look in the admin/manage jobs section I see that the job has been submitted successfully.  It shows that the job is queued and waiting to run.  On the cluster I see that the job runs and goes to completion.  However, the status in galaxy continue to show that the job is waiting to run and the status never gets updated.  I’m using postgres as my database and I am running through a nginx proxy.

 

I see no errors in my galaxy.log file.

 

I switched from galaxy.ini to galaxy.yml and from paste to uwsgi.  Is this a configuration problem on my end?  Where should I be looking?

 

Thanks,

-Sheldon

 

Sheldon Briand

Computer Systems and Applications Analyst

National Research Council/Government of Canada

[hidden email] Tel: (902) 426-1677

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: pbs/torque jobs

Briand, Sheldon (NRC/CNRC)

Hi Marius,

 

The PBS runner and the user is the galaxy user.  I do not use the run as real user option.  I haven’t been using drmaa_external_runjob_script.  This setup worked for my old 17.05 and previous versions of galaxy.

 

Thanks,

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Wednesday, June 13, 2018 4:58 PM
To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Hi Sheldon,

 

is there anything particular about your job configuration, e.g.

are you you using the run as real user option or are you using

the drmaa_external_runjob_script option ?

Are you using the drmaa or the PBS runner ?

 

Best,

Marius

 

On 13 June 2018 at 21:13, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi,

 

I have upgraded to Galaxy 18.05 (from 17.05).  I am running a torque job scheduler (version 6.02). 

 

When I submit a job through galaxy and I look in the admin/manage jobs section I see that the job has been submitted successfully.  It shows that the job is queued and waiting to run.  On the cluster I see that the job runs and goes to completion.  However, the status in galaxy continue to show that the job is waiting to run and the status never gets updated.  I’m using postgres as my database and I am running through a nginx proxy.

 

I see no errors in my galaxy.log file.

 

I switched from galaxy.ini to galaxy.yml and from paste to uwsgi.  Is this a configuration problem on my end?  Where should I be looking?

 

Thanks,

-Sheldon

 

Sheldon Briand

Computer Systems and Applications Analyst

National Research Council/Government of Canada

[hidden email] Tel: (902) 426-1677

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: pbs/torque jobs

Marius van den Beek
Hi Sheldon,

I'm not sure what the issue could be, the PBS runner hasn't been updated in ~3 years,
but of course many things around it have been.
Could you set galaxy's logging level to debug if it isn't already and check the logs ?
When you are submitting a job in galaxy what messages are you seeing in the logs ?

Best,
Marius

On 13 June 2018 at 22:07, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi Marius,

 

The PBS runner and the user is the galaxy user.  I do not use the run as real user option.  I haven’t been using drmaa_external_runjob_script.  This setup worked for my old 17.05 and previous versions of galaxy.

 

Thanks,

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Wednesday, June 13, 2018 4:58 PM
To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Hi Sheldon,

 

is there anything particular about your job configuration, e.g.

are you you using the run as real user option or are you using

the drmaa_external_runjob_script option ?

Are you using the drmaa or the PBS runner ?

 

Best,

Marius

 

On 13 June 2018 at 21:13, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi,

 

I have upgraded to Galaxy 18.05 (from 17.05).  I am running a torque job scheduler (version 6.02). 

 

When I submit a job through galaxy and I look in the admin/manage jobs section I see that the job has been submitted successfully.  It shows that the job is queued and waiting to run.  On the cluster I see that the job runs and goes to completion.  However, the status in galaxy continue to show that the job is waiting to run and the status never gets updated.  I’m using postgres as my database and I am running through a nginx proxy.

 

I see no errors in my galaxy.log file.

 

I switched from galaxy.ini to galaxy.yml and from paste to uwsgi.  Is this a configuration problem on my end?  Where should I be looking?

 

Thanks,

-Sheldon

 

Sheldon Briand

Computer Systems and Applications Analyst

National Research Council/Government of Canada

[hidden email] Tel: (902) 426-1677

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/

 



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: pbs/torque jobs

Marius van den Beek
So I was able to simulate a torque environment with https://github.com/aiidateam/torquessh_base-docker
and that seemed to have worked fine. 

I did have to install the torque headers, activate galaxy's virtualenv and install https://github.com/ehiggs/pbs-python.

You should be seeing something like:

```
galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,549 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) submitting file /home/app/galaxy/database/pbs/2.sh
galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,551 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) queued in default queue as 3.605046c8289c
galaxy.jobs DEBUG 2018-06-14 09:23:31,552 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) Persisting job destination (destination id: local)
galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,855 [p:1944,w:1,m:0] [Dummy-5] (2/3.605046c8289c) PBS job state changed from N to R
galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:33,994 [p:1944,w:1,m:0] [Dummy-5] (2/3.605046c8289c) PBS job has left queue
galaxy.model.metadata DEBUG 2018-06-14 09:23:34,110 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] loading metadata from file for: HistoryDatasetAssociation 2
galaxy.jobs INFO 2018-06-14 09:23:34,218 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] Collecting metrics for Job 2
galaxy.jobs DEBUG 2018-06-14 09:23:34,234 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] job 2 ended (finish() executed in (189.200 ms))
```

in your logs.

Let me know how this goes.

Best,
Marius

On 14 June 2018 at 10:15, Marius van den Beek <[hidden email]> wrote:
Hi Sheldon,

I'm not sure what the issue could be, the PBS runner hasn't been updated in ~3 years,
but of course many things around it have been.
Could you set galaxy's logging level to debug if it isn't already and check the logs ?
When you are submitting a job in galaxy what messages are you seeing in the logs ?

Best,
Marius

On 13 June 2018 at 22:07, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi Marius,

 

The PBS runner and the user is the galaxy user.  I do not use the run as real user option.  I haven’t been using drmaa_external_runjob_script.  This setup worked for my old 17.05 and previous versions of galaxy.

 

Thanks,

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Wednesday, June 13, 2018 4:58 PM
To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Hi Sheldon,

 

is there anything particular about your job configuration, e.g.

are you you using the run as real user option or are you using

the drmaa_external_runjob_script option ?

Are you using the drmaa or the PBS runner ?

 

Best,

Marius

 

On 13 June 2018 at 21:13, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi,

 

I have upgraded to Galaxy 18.05 (from 17.05).  I am running a torque job scheduler (version 6.02). 

 

When I submit a job through galaxy and I look in the admin/manage jobs section I see that the job has been submitted successfully.  It shows that the job is queued and waiting to run.  On the cluster I see that the job runs and goes to completion.  However, the status in galaxy continue to show that the job is waiting to run and the status never gets updated.  I’m using postgres as my database and I am running through a nginx proxy.

 

I see no errors in my galaxy.log file.

 

I switched from galaxy.ini to galaxy.yml and from paste to uwsgi.  Is this a configuration problem on my end?  Where should I be looking?

 

Thanks,

-Sheldon

 

Sheldon Briand

Computer Systems and Applications Analyst

National Research Council/Government of Canada

[hidden email] Tel: (902) 426-1677

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/

 




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: pbs/torque jobs

Briand, Sheldon (NRC/CNRC)

Marius,

 

Thanks for the help with this!

 

Below is what I am seeing in the logs but it only gets as far as saying the job is queued (the status of the job doesn’t change, even though torque actually finishes the job successfully-I have something configured wrong):

Dispatching to pbs runner

galaxy.jobs DEBUG 2018-06-14 11:12:22,356 [p:63562,w:2,m:0] [JobHandlerQueue.monitor_thread] (16117) Persisting job destination (destination id: pbs_default)

galaxy.tools.deps DEBUG 2018-06-14 11:12:22,775 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Using dependency htseq version 0.9.1 of type conda

galaxy.tools.deps DEBUG 2018-06-14 11:12:22,775 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Using dependency samtools version 1.7 of type conda

galaxy.jobs.command_factory INFO 2018-06-14 11:12:22,857 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Built script [/BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117/tool_script.sh] for tool command [[ "$CONDA_DEFAULT_ENV" = "/BigData/galaxy/galaxy-dist/conda_deps/envs/mulled-v1-c9f488ec0e9a96bed61dcc2e074b26ce37ed596751861ff368fd824a2a5f11d4" ] ||

galaxy.jobs.runners DEBUG 2018-06-14 11:12:23,143 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) command is: rm -rf working; mkdir -p working; cd working; /BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117/tool_script.sh; return_code=$?; cd '/BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117';

galaxy.jobs.runners.pbs DEBUG 2018-06-14 11:12:23,208 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) submitting file /BigData/galaxy/galaxy-dist/database/pbs/16117.sh

galaxy.jobs.runners.pbs DEBUG 2018-06-14 11:12:23,214 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) queued in default queue as 1434.locahost

galaxy.jobs DEBUG 2018-06-14 11:12:23,214 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) Persisting job destination (destination id: pbs_default)

 

 

 

 

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Thursday, June 14, 2018 6:29 AM
To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

So I was able to simulate a torque environment with https://github.com/aiidateam/torquessh_base-docker

and that seemed to have worked fine. 

 

I did have to install the torque headers, activate galaxy's virtualenv and install https://github.com/ehiggs/pbs-python.

 

You should be seeing something like:

 

```

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,549 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) submitting file /home/app/galaxy/database/pbs/2.sh

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,551 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) queued in default queue as 3.605046c8289c

galaxy.jobs DEBUG 2018-06-14 09:23:31,552 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) Persisting job destination (destination id: local)

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,855 [p:1944,w:1,m:0] [Dummy-5] (2/3.605046c8289c) PBS job state changed from N to R

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:33,994 [p:1944,w:1,m:0] [Dummy-5] (2/3.605046c8289c) PBS job has left queue

galaxy.model.metadata DEBUG 2018-06-14 09:23:34,110 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] loading metadata from file for: HistoryDatasetAssociation 2

galaxy.jobs INFO 2018-06-14 09:23:34,218 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] Collecting metrics for Job 2

galaxy.jobs DEBUG 2018-06-14 09:23:34,234 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] job 2 ended (finish() executed in (189.200 ms))

```

 

in your logs.

 

Let me know how this goes.

 

Best,

Marius

 

On 14 June 2018 at 10:15, Marius van den Beek <[hidden email]> wrote:

Hi Sheldon,

 

I'm not sure what the issue could be, the PBS runner hasn't been updated in ~3 years,

but of course many things around it have been.

Could you set galaxy's logging level to debug if it isn't already and check the logs ?

When you are submitting a job in galaxy what messages are you seeing in the logs ?

 

Best,

Marius

 

On 13 June 2018 at 22:07, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi Marius,

 

The PBS runner and the user is the galaxy user.  I do not use the run as real user option.  I haven’t been using drmaa_external_runjob_script.  This setup worked for my old 17.05 and previous versions of galaxy.

 

Thanks,

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Wednesday, June 13, 2018 4:58 PM
To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Hi Sheldon,

 

is there anything particular about your job configuration, e.g.

are you you using the run as real user option or are you using

the drmaa_external_runjob_script option ?

Are you using the drmaa or the PBS runner ?

 

Best,

Marius

 

On 13 June 2018 at 21:13, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi,

 

I have upgraded to Galaxy 18.05 (from 17.05).  I am running a torque job scheduler (version 6.02). 

 

When I submit a job through galaxy and I look in the admin/manage jobs section I see that the job has been submitted successfully.  It shows that the job is queued and waiting to run.  On the cluster I see that the job runs and goes to completion.  However, the status in galaxy continue to show that the job is waiting to run and the status never gets updated.  I’m using postgres as my database and I am running through a nginx proxy.

 

I see no errors in my galaxy.log file.

 

I switched from galaxy.ini to galaxy.yml and from paste to uwsgi.  Is this a configuration problem on my end?  Where should I be looking?

 

Thanks,

-Sheldon

 

Sheldon Briand

Computer Systems and Applications Analyst

National Research Council/Government of Canada

[hidden email] Tel: (902) 426-1677

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/

 

 

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: pbs/torque jobs

Marius van den Beek
Could you try running a job again with the following commit ?

This just logs some more things that might be helpful in diagnosing what's going on.

On 14 June 2018 at 16:27, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Marius,

 

Thanks for the help with this!

 

Below is what I am seeing in the logs but it only gets as far as saying the job is queued (the status of the job doesn’t change, even though torque actually finishes the job successfully-I have something configured wrong):

Dispatching to pbs runner

galaxy.jobs DEBUG 2018-06-14 11:12:22,356 [p:63562,w:2,m:0] [JobHandlerQueue.monitor_thread] (16117) Persisting job destination (destination id: pbs_default)

galaxy.tools.deps DEBUG 2018-06-14 11:12:22,775 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Using dependency htseq version 0.9.1 of type conda

galaxy.tools.deps DEBUG 2018-06-14 11:12:22,775 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Using dependency samtools version 1.7 of type conda

galaxy.jobs.command_factory INFO 2018-06-14 11:12:22,857 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Built script [/BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117/tool_script.sh] for tool command [[ "$CONDA_DEFAULT_ENV" = "/BigData/galaxy/galaxy-dist/conda_deps/envs/mulled-v1-c9f488ec0e9a96bed61dcc2e074b26ce37ed596751861ff368fd824a2a5f11d4" ] ||

galaxy.jobs.runners DEBUG 2018-06-14 11:12:23,143 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) command is: rm -rf working; mkdir -p working; cd working; /BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117/tool_script.sh; return_code=$?; cd '/BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117';

galaxy.jobs.runners.pbs DEBUG 2018-06-14 11:12:23,208 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) submitting file /BigData/galaxy/galaxy-dist/database/pbs/16117.sh

galaxy.jobs.runners.pbs DEBUG 2018-06-14 11:12:23,214 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) queued in default queue as 1434.locahost

galaxy.jobs DEBUG 2018-06-14 11:12:23,214 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) Persisting job destination (destination id: pbs_default)

 

 

 

 

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Thursday, June 14, 2018 6:29 AM


To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

So I was able to simulate a torque environment with https://github.com/aiidateam/torquessh_base-docker

and that seemed to have worked fine. 

 

I did have to install the torque headers, activate galaxy's virtualenv and install https://github.com/ehiggs/pbs-python.

 

You should be seeing something like:

 

```

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,549 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) submitting file /home/app/galaxy/database/pbs/2.sh

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,551 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) queued in default queue as 3.605046c8289c

galaxy.jobs DEBUG 2018-06-14 09:23:31,552 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) Persisting job destination (destination id: local)

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,855 [p:1944,w:1,m:0] [Dummy-5] (2/3.605046c8289c) PBS job state changed from N to R

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:33,994 [p:1944,w:1,m:0] [Dummy-5] (2/3.605046c8289c) PBS job has left queue

galaxy.model.metadata DEBUG 2018-06-14 09:23:34,110 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] loading metadata from file for: HistoryDatasetAssociation 2

galaxy.jobs INFO 2018-06-14 09:23:34,218 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] Collecting metrics for Job 2

galaxy.jobs DEBUG 2018-06-14 09:23:34,234 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] job 2 ended (finish() executed in (189.200 ms))

```

 

in your logs.

 

Let me know how this goes.

 

Best,

Marius

 

On 14 June 2018 at 10:15, Marius van den Beek <[hidden email]> wrote:

Hi Sheldon,

 

I'm not sure what the issue could be, the PBS runner hasn't been updated in ~3 years,

but of course many things around it have been.

Could you set galaxy's logging level to debug if it isn't already and check the logs ?

When you are submitting a job in galaxy what messages are you seeing in the logs ?

 

Best,

Marius

 

On 13 June 2018 at 22:07, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi Marius,

 

The PBS runner and the user is the galaxy user.  I do not use the run as real user option.  I haven’t been using drmaa_external_runjob_script.  This setup worked for my old 17.05 and previous versions of galaxy.

 

Thanks,

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Wednesday, June 13, 2018 4:58 PM
To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Hi Sheldon,

 

is there anything particular about your job configuration, e.g.

are you you using the run as real user option or are you using

the drmaa_external_runjob_script option ?

Are you using the drmaa or the PBS runner ?

 

Best,

Marius

 

On 13 June 2018 at 21:13, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi,

 

I have upgraded to Galaxy 18.05 (from 17.05).  I am running a torque job scheduler (version 6.02). 

 

When I submit a job through galaxy and I look in the admin/manage jobs section I see that the job has been submitted successfully.  It shows that the job is queued and waiting to run.  On the cluster I see that the job runs and goes to completion.  However, the status in galaxy continue to show that the job is waiting to run and the status never gets updated.  I’m using postgres as my database and I am running through a nginx proxy.

 

I see no errors in my galaxy.log file.

 

I switched from galaxy.ini to galaxy.yml and from paste to uwsgi.  Is this a configuration problem on my end?  Where should I be looking?

 

Thanks,

-Sheldon

 

Sheldon Briand

Computer Systems and Applications Analyst

National Research Council/Government of Canada

[hidden email] Tel: (902) 426-1677

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/

 

 

 



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: pbs/torque jobs

Briand, Sheldon (NRC/CNRC)

I put that code in the file and reran the job.  None of those status messages show up.  So I went looking and I see a pbs.py in my .venv directory that is twice as large as the file I changed and it doesn’t look at all like the current pbs.py.

 

Probably the wrong pbs.py is being called.  I’ll need to check the docs on venv setup again.

 

Thanks,

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Thursday, June 14, 2018 12:27 PM
To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Could you try running a job again with the following commit ?

 

This just logs some more things that might be helpful in diagnosing what's going on.

 

On 14 June 2018 at 16:27, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Marius,

 

Thanks for the help with this!

 

Below is what I am seeing in the logs but it only gets as far as saying the job is queued (the status of the job doesn’t change, even though torque actually finishes the job successfully-I have something configured wrong):

Dispatching to pbs runner

galaxy.jobs DEBUG 2018-06-14 11:12:22,356 [p:63562,w:2,m:0] [JobHandlerQueue.monitor_thread] (16117) Persisting job destination (destination id: pbs_default)

galaxy.tools.deps DEBUG 2018-06-14 11:12:22,775 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Using dependency htseq version 0.9.1 of type conda

galaxy.tools.deps DEBUG 2018-06-14 11:12:22,775 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Using dependency samtools version 1.7 of type conda

galaxy.jobs.command_factory INFO 2018-06-14 11:12:22,857 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Built script [/BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117/tool_script.sh] for tool command [[ "$CONDA_DEFAULT_ENV" = "/BigData/galaxy/galaxy-dist/conda_deps/envs/mulled-v1-c9f488ec0e9a96bed61dcc2e074b26ce37ed596751861ff368fd824a2a5f11d4" ] ||

galaxy.jobs.runners DEBUG 2018-06-14 11:12:23,143 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) command is: rm -rf working; mkdir -p working; cd working; /BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117/tool_script.sh; return_code=$?; cd '/BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117';

galaxy.jobs.runners.pbs DEBUG 2018-06-14 11:12:23,208 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) submitting file /BigData/galaxy/galaxy-dist/database/pbs/16117.sh

galaxy.jobs.runners.pbs DEBUG 2018-06-14 11:12:23,214 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) queued in default queue as 1434.locahost

galaxy.jobs DEBUG 2018-06-14 11:12:23,214 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) Persisting job destination (destination id: pbs_default)

 

 

 

 

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Thursday, June 14, 2018 6:29 AM


To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

So I was able to simulate a torque environment with https://github.com/aiidateam/torquessh_base-docker

and that seemed to have worked fine. 

 

I did have to install the torque headers, activate galaxy's virtualenv and install https://github.com/ehiggs/pbs-python.

 

You should be seeing something like:

 

```

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,549 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) submitting file /home/app/galaxy/database/pbs/2.sh

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,551 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) queued in default queue as 3.605046c8289c

galaxy.jobs DEBUG 2018-06-14 09:23:31,552 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) Persisting job destination (destination id: local)

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,855 [p:1944,w:1,m:0] [Dummy-5] (2/3.605046c8289c) PBS job state changed from N to R

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:33,994 [p:1944,w:1,m:0] [Dummy-5] (2/3.605046c8289c) PBS job has left queue

galaxy.model.metadata DEBUG 2018-06-14 09:23:34,110 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] loading metadata from file for: HistoryDatasetAssociation 2

galaxy.jobs INFO 2018-06-14 09:23:34,218 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] Collecting metrics for Job 2

galaxy.jobs DEBUG 2018-06-14 09:23:34,234 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] job 2 ended (finish() executed in (189.200 ms))

```

 

in your logs.

 

Let me know how this goes.

 

Best,

Marius

 

On 14 June 2018 at 10:15, Marius van den Beek <[hidden email]> wrote:

Hi Sheldon,

 

I'm not sure what the issue could be, the PBS runner hasn't been updated in ~3 years,

but of course many things around it have been.

Could you set galaxy's logging level to debug if it isn't already and check the logs ?

When you are submitting a job in galaxy what messages are you seeing in the logs ?

 

Best,

Marius

 

On 13 June 2018 at 22:07, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi Marius,

 

The PBS runner and the user is the galaxy user.  I do not use the run as real user option.  I haven’t been using drmaa_external_runjob_script.  This setup worked for my old 17.05 and previous versions of galaxy.

 

Thanks,

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Wednesday, June 13, 2018 4:58 PM
To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Hi Sheldon,

 

is there anything particular about your job configuration, e.g.

are you you using the run as real user option or are you using

the drmaa_external_runjob_script option ?

Are you using the drmaa or the PBS runner ?

 

Best,

Marius

 

On 13 June 2018 at 21:13, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi,

 

I have upgraded to Galaxy 18.05 (from 17.05).  I am running a torque job scheduler (version 6.02). 

 

When I submit a job through galaxy and I look in the admin/manage jobs section I see that the job has been submitted successfully.  It shows that the job is queued and waiting to run.  On the cluster I see that the job runs and goes to completion.  However, the status in galaxy continue to show that the job is waiting to run and the status never gets updated.  I’m using postgres as my database and I am running through a nginx proxy.

 

I see no errors in my galaxy.log file.

 

I switched from galaxy.ini to galaxy.yml and from paste to uwsgi.  Is this a configuration problem on my end?  Where should I be looking?

 

Thanks,

-Sheldon

 

Sheldon Briand

Computer Systems and Applications Analyst

National Research Council/Government of Canada

[hidden email] Tel: (902) 426-1677

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/

 

 

 

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: pbs/torque jobs

Marius van den Beek
Hi Sheldon,

It is normal to have a pbs.py file in your virtualenv. This is a file provided by pbs-python, which galaxy makes use of.
This is not the same as the file you just modified, which contains the code galaxy uses to submit and check the status of PBS jobs,
using this library.

Assuming you restarted galaxy in between (forgot to mention that) this probably means
that your job handlers are somehow not active.
How are you starting galaxy and which setup are you trying to achieve ?
An up to date and extensive documentation of the available options is here:

Marius

On 14 June 2018 at 18:28, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

I put that code in the file and reran the job.  None of those status messages show up.  So I went looking and I see a pbs.py in my .venv directory that is twice as large as the file I changed and it doesn’t look at all like the current pbs.py.

 

Probably the wrong pbs.py is being called.  I’ll need to check the docs on venv setup again.

 

Thanks,

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Thursday, June 14, 2018 12:27 PM


To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Could you try running a job again with the following commit ?

 

This just logs some more things that might be helpful in diagnosing what's going on.

 

On 14 June 2018 at 16:27, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Marius,

 

Thanks for the help with this!

 

Below is what I am seeing in the logs but it only gets as far as saying the job is queued (the status of the job doesn’t change, even though torque actually finishes the job successfully-I have something configured wrong):

Dispatching to pbs runner

galaxy.jobs DEBUG 2018-06-14 11:12:22,356 [p:63562,w:2,m:0] [JobHandlerQueue.monitor_thread] (16117) Persisting job destination (destination id: pbs_default)

galaxy.tools.deps DEBUG 2018-06-14 11:12:22,775 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Using dependency htseq version 0.9.1 of type conda

galaxy.tools.deps DEBUG 2018-06-14 11:12:22,775 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Using dependency samtools version 1.7 of type conda

galaxy.jobs.command_factory INFO 2018-06-14 11:12:22,857 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Built script [/BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117/tool_script.sh] for tool command [[ "$CONDA_DEFAULT_ENV" = "/BigData/galaxy/galaxy-dist/conda_deps/envs/mulled-v1-c9f488ec0e9a96bed61dcc2e074b26ce37ed596751861ff368fd824a2a5f11d4" ] ||

galaxy.jobs.runners DEBUG 2018-06-14 11:12:23,143 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) command is: rm -rf working; mkdir -p working; cd working; /BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117/tool_script.sh; return_code=$?; cd '/BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117';

galaxy.jobs.runners.pbs DEBUG 2018-06-14 11:12:23,208 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) submitting file /BigData/galaxy/galaxy-dist/database/pbs/16117.sh

galaxy.jobs.runners.pbs DEBUG 2018-06-14 11:12:23,214 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) queued in default queue as 1434.locahost

galaxy.jobs DEBUG 2018-06-14 11:12:23,214 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) Persisting job destination (destination id: pbs_default)

 

 

 

 

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Thursday, June 14, 2018 6:29 AM


To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

So I was able to simulate a torque environment with https://github.com/aiidateam/torquessh_base-docker

and that seemed to have worked fine. 

 

I did have to install the torque headers, activate galaxy's virtualenv and install https://github.com/ehiggs/pbs-python.

 

You should be seeing something like:

 

```

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,549 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) submitting file /home/app/galaxy/database/pbs/2.sh

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,551 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) queued in default queue as 3.605046c8289c

galaxy.jobs DEBUG 2018-06-14 09:23:31,552 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) Persisting job destination (destination id: local)

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,855 [p:1944,w:1,m:0] [Dummy-5] (2/3.605046c8289c) PBS job state changed from N to R

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:33,994 [p:1944,w:1,m:0] [Dummy-5] (2/3.605046c8289c) PBS job has left queue

galaxy.model.metadata DEBUG 2018-06-14 09:23:34,110 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] loading metadata from file for: HistoryDatasetAssociation 2

galaxy.jobs INFO 2018-06-14 09:23:34,218 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] Collecting metrics for Job 2

galaxy.jobs DEBUG 2018-06-14 09:23:34,234 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] job 2 ended (finish() executed in (189.200 ms))

```

 

in your logs.

 

Let me know how this goes.

 

Best,

Marius

 

On 14 June 2018 at 10:15, Marius van den Beek <[hidden email]> wrote:

Hi Sheldon,

 

I'm not sure what the issue could be, the PBS runner hasn't been updated in ~3 years,

but of course many things around it have been.

Could you set galaxy's logging level to debug if it isn't already and check the logs ?

When you are submitting a job in galaxy what messages are you seeing in the logs ?

 

Best,

Marius

 

On 13 June 2018 at 22:07, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi Marius,

 

The PBS runner and the user is the galaxy user.  I do not use the run as real user option.  I haven’t been using drmaa_external_runjob_script.  This setup worked for my old 17.05 and previous versions of galaxy.

 

Thanks,

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Wednesday, June 13, 2018 4:58 PM
To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Hi Sheldon,

 

is there anything particular about your job configuration, e.g.

are you you using the run as real user option or are you using

the drmaa_external_runjob_script option ?

Are you using the drmaa or the PBS runner ?

 

Best,

Marius

 

On 13 June 2018 at 21:13, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi,

 

I have upgraded to Galaxy 18.05 (from 17.05).  I am running a torque job scheduler (version 6.02). 

 

When I submit a job through galaxy and I look in the admin/manage jobs section I see that the job has been submitted successfully.  It shows that the job is queued and waiting to run.  On the cluster I see that the job runs and goes to completion.  However, the status in galaxy continue to show that the job is waiting to run and the status never gets updated.  I’m using postgres as my database and I am running through a nginx proxy.

 

I see no errors in my galaxy.log file.

 

I switched from galaxy.ini to galaxy.yml and from paste to uwsgi.  Is this a configuration problem on my end?  Where should I be looking?

 

Thanks,

-Sheldon

 

Sheldon Briand

Computer Systems and Applications Analyst

National Research Council/Government of Canada

[hidden email] Tel: (902) 426-1677

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/

 

 

 

 



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: pbs/torque jobs

Briand, Sheldon (NRC/CNRC)

Hi Marius,

 

I want to thank you for your help with this.  The problem was I that I was using the wrong deployment.  I have left the settings to the default deployment but I just switched to uWSGI + Mules and that has fixed my problem.  

 

Thanks!

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Thursday, June 14, 2018 2:12 PM
To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Hi Sheldon,

 

It is normal to have a pbs.py file in your virtualenv. This is a file provided by pbs-python, which galaxy makes use of.

This is not the same as the file you just modified, which contains the code galaxy uses to submit and check the status of PBS jobs,

using this library.

 

Assuming you restarted galaxy in between (forgot to mention that) this probably means

that your job handlers are somehow not active.

How are you starting galaxy and which setup are you trying to achieve ?

An up to date and extensive documentation of the available options is here:

 

Marius

 

On 14 June 2018 at 18:28, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

I put that code in the file and reran the job.  None of those status messages show up.  So I went looking and I see a pbs.py in my .venv directory that is twice as large as the file I changed and it doesn’t look at all like the current pbs.py.

 

Probably the wrong pbs.py is being called.  I’ll need to check the docs on venv setup again.

 

Thanks,

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Thursday, June 14, 2018 12:27 PM


To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Could you try running a job again with the following commit ?

 

This just logs some more things that might be helpful in diagnosing what's going on.

 

On 14 June 2018 at 16:27, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Marius,

 

Thanks for the help with this!

 

Below is what I am seeing in the logs but it only gets as far as saying the job is queued (the status of the job doesn’t change, even though torque actually finishes the job successfully-I have something configured wrong):

Dispatching to pbs runner

galaxy.jobs DEBUG 2018-06-14 11:12:22,356 [p:63562,w:2,m:0] [JobHandlerQueue.monitor_thread] (16117) Persisting job destination (destination id: pbs_default)

galaxy.tools.deps DEBUG 2018-06-14 11:12:22,775 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Using dependency htseq version 0.9.1 of type conda

galaxy.tools.deps DEBUG 2018-06-14 11:12:22,775 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Using dependency samtools version 1.7 of type conda

galaxy.jobs.command_factory INFO 2018-06-14 11:12:22,857 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Built script [/BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117/tool_script.sh] for tool command [[ "$CONDA_DEFAULT_ENV" = "/BigData/galaxy/galaxy-dist/conda_deps/envs/mulled-v1-c9f488ec0e9a96bed61dcc2e074b26ce37ed596751861ff368fd824a2a5f11d4" ] ||

galaxy.jobs.runners DEBUG 2018-06-14 11:12:23,143 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) command is: rm -rf working; mkdir -p working; cd working; /BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117/tool_script.sh; return_code=$?; cd '/BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117';

galaxy.jobs.runners.pbs DEBUG 2018-06-14 11:12:23,208 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) submitting file /BigData/galaxy/galaxy-dist/database/pbs/16117.sh

galaxy.jobs.runners.pbs DEBUG 2018-06-14 11:12:23,214 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) queued in default queue as 1434.locahost

galaxy.jobs DEBUG 2018-06-14 11:12:23,214 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) Persisting job destination (destination id: pbs_default)

 

 

 

 

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Thursday, June 14, 2018 6:29 AM


To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

So I was able to simulate a torque environment with https://github.com/aiidateam/torquessh_base-docker

and that seemed to have worked fine. 

 

I did have to install the torque headers, activate galaxy's virtualenv and install https://github.com/ehiggs/pbs-python.

 

You should be seeing something like:

 

```

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,549 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) submitting file /home/app/galaxy/database/pbs/2.sh

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,551 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) queued in default queue as 3.605046c8289c

galaxy.jobs DEBUG 2018-06-14 09:23:31,552 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) Persisting job destination (destination id: local)

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,855 [p:1944,w:1,m:0] [Dummy-5] (2/3.605046c8289c) PBS job state changed from N to R

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:33,994 [p:1944,w:1,m:0] [Dummy-5] (2/3.605046c8289c) PBS job has left queue

galaxy.model.metadata DEBUG 2018-06-14 09:23:34,110 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] loading metadata from file for: HistoryDatasetAssociation 2

galaxy.jobs INFO 2018-06-14 09:23:34,218 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] Collecting metrics for Job 2

galaxy.jobs DEBUG 2018-06-14 09:23:34,234 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] job 2 ended (finish() executed in (189.200 ms))

```

 

in your logs.

 

Let me know how this goes.

 

Best,

Marius

 

On 14 June 2018 at 10:15, Marius van den Beek <[hidden email]> wrote:

Hi Sheldon,

 

I'm not sure what the issue could be, the PBS runner hasn't been updated in ~3 years,

but of course many things around it have been.

Could you set galaxy's logging level to debug if it isn't already and check the logs ?

When you are submitting a job in galaxy what messages are you seeing in the logs ?

 

Best,

Marius

 

On 13 June 2018 at 22:07, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi Marius,

 

The PBS runner and the user is the galaxy user.  I do not use the run as real user option.  I haven’t been using drmaa_external_runjob_script.  This setup worked for my old 17.05 and previous versions of galaxy.

 

Thanks,

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Wednesday, June 13, 2018 4:58 PM
To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Hi Sheldon,

 

is there anything particular about your job configuration, e.g.

are you you using the run as real user option or are you using

the drmaa_external_runjob_script option ?

Are you using the drmaa or the PBS runner ?

 

Best,

Marius

 

On 13 June 2018 at 21:13, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi,

 

I have upgraded to Galaxy 18.05 (from 17.05).  I am running a torque job scheduler (version 6.02). 

 

When I submit a job through galaxy and I look in the admin/manage jobs section I see that the job has been submitted successfully.  It shows that the job is queued and waiting to run.  On the cluster I see that the job runs and goes to completion.  However, the status in galaxy continue to show that the job is waiting to run and the status never gets updated.  I’m using postgres as my database and I am running through a nginx proxy.

 

I see no errors in my galaxy.log file.

 

I switched from galaxy.ini to galaxy.yml and from paste to uwsgi.  Is this a configuration problem on my end?  Where should I be looking?

 

Thanks,

-Sheldon

 

Sheldon Briand

Computer Systems and Applications Analyst

National Research Council/Government of Canada

[hidden email] Tel: (902) 426-1677

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/

 

 

 

 

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: pbs/torque jobs

Marius van den Beek
Great, good to know this is all working!

On 14 June 2018 at 19:35, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi Marius,

 

I want to thank you for your help with this.  The problem was I that I was using the wrong deployment.  I have left the settings to the default deployment but I just switched to uWSGI + Mules and that has fixed my problem.  

 

Thanks!

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Thursday, June 14, 2018 2:12 PM


To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Hi Sheldon,

 

It is normal to have a pbs.py file in your virtualenv. This is a file provided by pbs-python, which galaxy makes use of.

This is not the same as the file you just modified, which contains the code galaxy uses to submit and check the status of PBS jobs,

using this library.

 

Assuming you restarted galaxy in between (forgot to mention that) this probably means

that your job handlers are somehow not active.

How are you starting galaxy and which setup are you trying to achieve ?

An up to date and extensive documentation of the available options is here:

 

Marius

 

On 14 June 2018 at 18:28, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

I put that code in the file and reran the job.  None of those status messages show up.  So I went looking and I see a pbs.py in my .venv directory that is twice as large as the file I changed and it doesn’t look at all like the current pbs.py.

 

Probably the wrong pbs.py is being called.  I’ll need to check the docs on venv setup again.

 

Thanks,

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Thursday, June 14, 2018 12:27 PM


To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Could you try running a job again with the following commit ?

 

This just logs some more things that might be helpful in diagnosing what's going on.

 

On 14 June 2018 at 16:27, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Marius,

 

Thanks for the help with this!

 

Below is what I am seeing in the logs but it only gets as far as saying the job is queued (the status of the job doesn’t change, even though torque actually finishes the job successfully-I have something configured wrong):

Dispatching to pbs runner

galaxy.jobs DEBUG 2018-06-14 11:12:22,356 [p:63562,w:2,m:0] [JobHandlerQueue.monitor_thread] (16117) Persisting job destination (destination id: pbs_default)

galaxy.tools.deps DEBUG 2018-06-14 11:12:22,775 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Using dependency htseq version 0.9.1 of type conda

galaxy.tools.deps DEBUG 2018-06-14 11:12:22,775 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Using dependency samtools version 1.7 of type conda

galaxy.jobs.command_factory INFO 2018-06-14 11:12:22,857 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] Built script [/BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117/tool_script.sh] for tool command [[ "$CONDA_DEFAULT_ENV" = "/BigData/galaxy/galaxy-dist/conda_deps/envs/mulled-v1-c9f488ec0e9a96bed61dcc2e074b26ce37ed596751861ff368fd824a2a5f11d4" ] ||

galaxy.jobs.runners DEBUG 2018-06-14 11:12:23,143 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) command is: rm -rf working; mkdir -p working; cd working; /BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117/tool_script.sh; return_code=$?; cd '/BigData/galaxy/galaxy-dist/database/jobs_directory/016/16117';

galaxy.jobs.runners.pbs DEBUG 2018-06-14 11:12:23,208 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) submitting file /BigData/galaxy/galaxy-dist/database/pbs/16117.sh

galaxy.jobs.runners.pbs DEBUG 2018-06-14 11:12:23,214 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) queued in default queue as 1434.locahost

galaxy.jobs DEBUG 2018-06-14 11:12:23,214 [p:63562,w:2,m:0] [PBSRunner.work_thread-1] (16117) Persisting job destination (destination id: pbs_default)

 

 

 

 

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Thursday, June 14, 2018 6:29 AM


To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

So I was able to simulate a torque environment with https://github.com/aiidateam/torquessh_base-docker

and that seemed to have worked fine. 

 

I did have to install the torque headers, activate galaxy's virtualenv and install https://github.com/ehiggs/pbs-python.

 

You should be seeing something like:

 

```

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,549 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) submitting file /home/app/galaxy/database/pbs/2.sh

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,551 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) queued in default queue as 3.605046c8289c

galaxy.jobs DEBUG 2018-06-14 09:23:31,552 [p:1944,w:1,m:0] [PBSRunner.work_thread-2] (2) Persisting job destination (destination id: local)

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:31,855 [p:1944,w:1,m:0] [Dummy-5] (2/3.605046c8289c) PBS job state changed from N to R

galaxy.jobs.runners.pbs DEBUG 2018-06-14 09:23:33,994 [p:1944,w:1,m:0] [Dummy-5] (2/3.605046c8289c) PBS job has left queue

galaxy.model.metadata DEBUG 2018-06-14 09:23:34,110 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] loading metadata from file for: HistoryDatasetAssociation 2

galaxy.jobs INFO 2018-06-14 09:23:34,218 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] Collecting metrics for Job 2

galaxy.jobs DEBUG 2018-06-14 09:23:34,234 [p:1944,w:1,m:0] [PBSRunner.work_thread-3] job 2 ended (finish() executed in (189.200 ms))

```

 

in your logs.

 

Let me know how this goes.

 

Best,

Marius

 

On 14 June 2018 at 10:15, Marius van den Beek <[hidden email]> wrote:

Hi Sheldon,

 

I'm not sure what the issue could be, the PBS runner hasn't been updated in ~3 years,

but of course many things around it have been.

Could you set galaxy's logging level to debug if it isn't already and check the logs ?

When you are submitting a job in galaxy what messages are you seeing in the logs ?

 

Best,

Marius

 

On 13 June 2018 at 22:07, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi Marius,

 

The PBS runner and the user is the galaxy user.  I do not use the run as real user option.  I haven’t been using drmaa_external_runjob_script.  This setup worked for my old 17.05 and previous versions of galaxy.

 

Thanks,

-Sheldon

 

From: Marius van den Beek [mailto:[hidden email]]
Sent: Wednesday, June 13, 2018 4:58 PM
To: Briand, Sheldon (NRC/CNRC) <[hidden email]>
Cc: [hidden email]
Subject: Re: [galaxy-dev] pbs/torque jobs

 

Hi Sheldon,

 

is there anything particular about your job configuration, e.g.

are you you using the run as real user option or are you using

the drmaa_external_runjob_script option ?

Are you using the drmaa or the PBS runner ?

 

Best,

Marius

 

On 13 June 2018 at 21:13, Briand, Sheldon (NRC/CNRC) <[hidden email]> wrote:

Hi,

 

I have upgraded to Galaxy 18.05 (from 17.05).  I am running a torque job scheduler (version 6.02). 

 

When I submit a job through galaxy and I look in the admin/manage jobs section I see that the job has been submitted successfully.  It shows that the job is queued and waiting to run.  On the cluster I see that the job runs and goes to completion.  However, the status in galaxy continue to show that the job is waiting to run and the status never gets updated.  I’m using postgres as my database and I am running through a nginx proxy.

 

I see no errors in my galaxy.log file.

 

I switched from galaxy.ini to galaxy.yml and from paste to uwsgi.  Is this a configuration problem on my end?  Where should I be looking?

 

Thanks,

-Sheldon

 

Sheldon Briand

Computer Systems and Applications Analyst

National Research Council/Government of Canada

[hidden email] Tel: (902) 426-1677

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/

 

 

 

 

 



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/