Make Galaxy use multiple processors

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Make Galaxy use multiple processors

Dennis Gascoigne-2
I know it must be possible to make Galaxy use more of the resources available to it but I haven't been able to figure out how. If galaxy is calling python scripts, it only seems to use a single processor to do so. We have an 8 core machine and if other binaries are spawned then they get other processors no problems, but any of the python operations though seem to operate on the same processor no matter how many of these 'python based' jobs are running i.e. If i run 5 tools simultaneously, I would expect allocation to make use of all available processing resources.

Is there some config to make this happen? I am sure I am missing something basic.

Cheers



_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Make Galaxy use multiple processors

Ry4an Brase-4
On Mon, Jun 07, 2010 at 03:57:29PM +1000, Dennis Gascoigne wrote:

> I know it must be possible to make Galaxy use more of the resources
> available to it but I haven't been able to figure out how. If galaxy is
> calling python scripts, it only seems to use a single processor to do so. We
> have an 8 core machine and if other binaries are spawned then they get other
> processors no problems, but any of the python operations though seem to
> operate on the same processor no matter how many of these 'python based'
> jobs are running i.e. If i run 5 tools simultaneously, I would expect
> allocation to make use of all available processing resources.
>
> Is there some config to make this happen? I am sure I am missing something
> basic.

There was a great presentation about just how to do this at dev con
2010:

https://docs.google.com/viewer?url=http://bitbucket.org/galaxy/galaxy-central/wiki/DevConf2010/galaxy_devconf_2010_scalable.pdf

In short, you run multiple galaxy 'runner' instances that federate with
a single 'web' instance.  This is required as the local runner uses the
same python VM to run tasks and python VMs have a single-thread
chokepoint.


--
Ry4an Brase                                         612-626-6575
University of Minnesota Supercomputing Institute
for Advanced Computational Research                 http://www.msi.umn.edu
_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Make Galaxy use multiple processors

Ry4an Brase-3
In reply to this post by Dennis Gascoigne-2
On Mon, Jun 07, 2010 at 03:57:29PM +1000, Dennis Gascoigne wrote:

> I know it must be possible to make Galaxy use more of the resources
> available to it but I haven't been able to figure out how. If galaxy is
> calling python scripts, it only seems to use a single processor to do so. We
> have an 8 core machine and if other binaries are spawned then they get other
> processors no problems, but any of the python operations though seem to
> operate on the same processor no matter how many of these 'python based'
> jobs are running i.e. If i run 5 tools simultaneously, I would expect
> allocation to make use of all available processing resources.
>
> Is there some config to make this happen? I am sure I am missing something
> basic.

There was a great presentation about just how to do this at dev con
2010:

https://docs.google.com/viewer?url=http://bitbucket.org/galaxy/galaxy-central/wiki/DevConf2010/galaxy_devconf_2010_scalable.pdf

In short, you run multiple galaxy 'runner' instances that federate with
a single 'web' instance.  This is required as the local runner uses the
same python VM to run tasks and python VMs have a single-thread
chokepoint.


--
Ry4an Brase                                         612-626-6575
University of Minnesota Supercomputing Institute
for Advanced Computational Research                 http://www.msi.umn.edu
_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Make Galaxy use multiple processors

Nate Coraor (nate@bx.psu.edu)
In reply to this post by Dennis Gascoigne-2
Dennis Gascoigne wrote:

> I know it must be possible to make Galaxy use more of the resources
> available to it but I haven't been able to figure out how. If galaxy is
> calling python scripts, it only seems to use a single processor to do
> so. We have an 8 core machine and if other binaries are spawned then
> they get other processors no problems, but any of the python operations
> though seem to operate on the same processor no matter how many of these
> 'python based' jobs are running i.e. If i run 5 tools simultaneously, I
> would expect allocation to make use of all available processing resources.
>
> Is there some config to make this happen? I am sure I am missing
> something basic.

Hi Dennis,

This is something I've referred to on the ProductionServer wiki page,
but have not yet documented.  I also covered it in my Developer
Conference talk (slides: http://usegalaxy.org/dev2010 ).

However, probably the best place to find it right now is the
conversation on the list from last month between Davide Cittaro and I:

http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-May/002634.html

I'd suggest copying universe_wsgi.ini in both instances rather than
moving in one, as I originally suggested.  Otherwise, egg-checking
routines will fail because of the missing config file.

--nate

>
> Cheers
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> galaxy-dev mailing list
> [hidden email]
> http://lists.bx.psu.edu/listinfo/galaxy-dev

_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Make Galaxy use multiple processors

James Taylor
Nate, shouldn't python tools run as different processes scale out to  
multiple processors without any special configuration though? E.g.  
running 5 gops jobs in parallel should use multiple processes. Is  
there something that prevents this?

On Jun 7, 2010, at 9:29 AM, Nate Coraor wrote:

> Dennis Gascoigne wrote:
>> I know it must be possible to make Galaxy use more of the resources  
>> available to it but I haven't been able to figure out how. If  
>> galaxy is calling python scripts, it only seems to use a single  
>> processor to do so. We have an 8 core machine and if other binaries  
>> are spawned then they get other processors no problems, but any of  
>> the python operations though seem to operate on the same processor  
>> no matter how many of these 'python based' jobs are running i.e. If  
>> i run 5 tools simultaneously, I would expect allocation to make use  
>> of all available processing resources.
>> Is there some config to make this happen? I am sure I am missing  
>> something basic.
>
> Hi Dennis,
>
> This is something I've referred to on the ProductionServer wiki  
> page, but have not yet documented.  I also covered it in my  
> Developer Conference talk (slides: http://usegalaxy.org/dev2010 ).
>
> However, probably the best place to find it right now is the  
> conversation on the list from last month between Davide Cittaro and I:
>
> http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-May/002634.html
>
> I'd suggest copying universe_wsgi.ini in both instances rather than  
> moving in one, as I originally suggested.  Otherwise, egg-checking  
> routines will fail because of the missing config file.
>
> --nate
>
>> Cheers
>> ------------------------------------------------------------------------
>> _______________________________________________
>> galaxy-dev mailing list
>> [hidden email]
>> http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> _______________________________________________
> galaxy-dev mailing list
> [hidden email]
> http://lists.bx.psu.edu/listinfo/galaxy-dev

_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Make Galaxy use multiple processors

Nate Coraor (nate@bx.psu.edu)
James Taylor wrote:
> Nate, shouldn't python tools run as different processes scale out to
> multiple processors without any special configuration though? E.g.
> running 5 gops jobs in parallel should use multiple processes. Is there
> something that prevents this?

Oh, yes, I suppose I failed to read this email properly.  It should
definitely be starting separate processes, which would remove the GIL's
one core limitation.  Dennis, are you seeing multiple python processes
when you run these tools?

--nate

>
> On Jun 7, 2010, at 9:29 AM, Nate Coraor wrote:
>
>> Dennis Gascoigne wrote:
>>> I know it must be possible to make Galaxy use more of the resources
>>> available to it but I haven't been able to figure out how. If galaxy
>>> is calling python scripts, it only seems to use a single processor to
>>> do so. We have an 8 core machine and if other binaries are spawned
>>> then they get other processors no problems, but any of the python
>>> operations though seem to operate on the same processor no matter how
>>> many of these 'python based' jobs are running i.e. If i run 5 tools
>>> simultaneously, I would expect allocation to make use of all
>>> available processing resources.
>>> Is there some config to make this happen? I am sure I am missing
>>> something basic.
>>
>> Hi Dennis,
>>
>> This is something I've referred to on the ProductionServer wiki page,
>> but have not yet documented.  I also covered it in my Developer
>> Conference talk (slides: http://usegalaxy.org/dev2010 ).
>>
>> However, probably the best place to find it right now is the
>> conversation on the list from last month between Davide Cittaro and I:
>>
>> http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-May/002634.html
>>
>> I'd suggest copying universe_wsgi.ini in both instances rather than
>> moving in one, as I originally suggested.  Otherwise, egg-checking
>> routines will fail because of the missing config file.
>>
>> --nate
>>
>>> Cheers
>>> ------------------------------------------------------------------------
>>> _______________________________________________
>>> galaxy-dev mailing list
>>> [hidden email]
>>> http://lists.bx.psu.edu/listinfo/galaxy-dev
>>
>> _______________________________________________
>> galaxy-dev mailing list
>> [hidden email]
>> http://lists.bx.psu.edu/listinfo/galaxy-dev
>

_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Make Galaxy use multiple processors

Dennis Gascoigne-2
I am seeing two python processes. One is my runner - maxing at appr. 100%, the other is my web process just ticking along at about 5%. We have a user who has kick off  5 version of SAM to Interval. I would expect to see 5 processes, preferably taking multiple processors - I am seeing 1

On Tue, Jun 8, 2010 at 7:38 AM, Dennis Gascoigne <[hidden email]> wrote:
I am seeing two python processes. One is my runner - maxing at appr. 100%, the other is my web process just ticking along at about 5%. There are 5 instances of SAM to Interval. I would expect to see 5 processes, preferably taking multiple processors.


On Mon, Jun 7, 2010 at 11:35 PM, Nate Coraor <[hidden email]> wrote:
James Taylor wrote:
Nate, shouldn't python tools run as different processes scale out to multiple processors without any special configuration though? E.g. running 5 gops jobs in parallel should use multiple processes. Is there something that prevents this?

Oh, yes, I suppose I failed to read this email properly.  It should definitely be starting separate processes, which would remove the GIL's one core limitation.  Dennis, are you seeing multiple python processes when you run these tools?

--nate



On Jun 7, 2010, at 9:29 AM, Nate Coraor wrote:

Dennis Gascoigne wrote:
I know it must be possible to make Galaxy use more of the resources available to it but I haven't been able to figure out how. If galaxy is calling python scripts, it only seems to use a single processor to do so. We have an 8 core machine and if other binaries are spawned then they get other processors no problems, but any of the python operations though seem to operate on the same processor no matter how many of these 'python based' jobs are running i.e. If i run 5 tools simultaneously, I would expect allocation to make use of all available processing resources.
Is there some config to make this happen? I am sure I am missing something basic.

Hi Dennis,

This is something I've referred to on the ProductionServer wiki page, but have not yet documented.  I also covered it in my Developer Conference talk (slides: http://usegalaxy.org/dev2010 ).

However, probably the best place to find it right now is the conversation on the list from last month between Davide Cittaro and I:

http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-May/002634.html

I'd suggest copying universe_wsgi.ini in both instances rather than moving in one, as I originally suggested.  Otherwise, egg-checking routines will fail because of the missing config file.

--nate

Cheers
------------------------------------------------------------------------
_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev

_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev





--
Dennis Gascoigne
0407 639 995
[hidden email]



--
Dennis Gascoigne
0407 639 995
[hidden email]

_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Make Galaxy use multiple processors

Dennis Gascoigne-2
In reply to this post by Nate Coraor (nate@bx.psu.edu)
Just a clarification

What designates how many cores are used? The number of web runners, or the number of job runners or something else? I understand that at the moment, you can only have one job runner so if it's the job runner can I actually have more than 1 core utilized? I have 2 web runners, and one job runner, but the demand has not yet warranted kicking in the 2nd web runner - but it will soon.

On Mon, Jun 7, 2010 at 11:35 PM, Nate Coraor <[hidden email]> wrote:
James Taylor wrote:
Nate, shouldn't python tools run as different processes scale out to multiple processors without any special configuration though? E.g. running 5 gops jobs in parallel should use multiple processes. Is there something that prevents this?

Oh, yes, I suppose I failed to read this email properly.  It should definitely be starting separate processes, which would remove the GIL's one core limitation.  Dennis, are you seeing multiple python processes when you run these tools?

--nate



On Jun 7, 2010, at 9:29 AM, Nate Coraor wrote:

Dennis Gascoigne wrote:
I know it must be possible to make Galaxy use more of the resources available to it but I haven't been able to figure out how. If galaxy is calling python scripts, it only seems to use a single processor to do so. We have an 8 core machine and if other binaries are spawned then they get other processors no problems, but any of the python operations though seem to operate on the same processor no matter how many of these 'python based' jobs are running i.e. If i run 5 tools simultaneously, I would expect allocation to make use of all available processing resources.
Is there some config to make this happen? I am sure I am missing something basic.

Hi Dennis,

This is something I've referred to on the ProductionServer wiki page, but have not yet documented.  I also covered it in my Developer Conference talk (slides: http://usegalaxy.org/dev2010 ).

However, probably the best place to find it right now is the conversation on the list from last month between Davide Cittaro and I:

http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-May/002634.html

I'd suggest copying universe_wsgi.ini in both instances rather than moving in one, as I originally suggested.  Otherwise, egg-checking routines will fail because of the missing config file.

--nate

Cheers
------------------------------------------------------------------------
_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev

_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev





--
Dennis Gascoigne
0407 639 995
[hidden email]

_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Make Galaxy use multiple processors

Nate Coraor (nate@bx.psu.edu)
In reply to this post by Dennis Gascoigne-2
Dennis Gascoigne wrote:
> I am seeing two python processes. One is my runner - maxing at appr.
> 100%, the other is my web process just ticking along at about 5%. We
> have a user who has kick off  5 version of SAM to Interval. I would
> expect to see 5 processes, preferably taking multiple processors - I am
> seeing 1

Dennis,

I suspect the tool may have finished, but the process of setting
metadata is still running.  Try setting the following in your job
runner's config file:

set_metadata_externally = True

This will cause metadata generation to happen in its own process.

--nate

>
> On Tue, Jun 8, 2010 at 7:38 AM, Dennis Gascoigne
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     I am seeing two python processes. One is my runner - maxing at appr.
>     100%, the other is my web process just ticking along at about 5%.
>     There are 5 instances of SAM to Interval. I would expect to see 5
>     processes, preferably taking multiple processors.
>
>
>     On Mon, Jun 7, 2010 at 11:35 PM, Nate Coraor <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         James Taylor wrote:
>
>             Nate, shouldn't python tools run as different processes
>             scale out to multiple processors without any special
>             configuration though? E.g. running 5 gops jobs in parallel
>             should use multiple processes. Is there something that
>             prevents this?
>
>
>         Oh, yes, I suppose I failed to read this email properly.  It
>         should definitely be starting separate processes, which would
>         remove the GIL's one core limitation.  Dennis, are you seeing
>         multiple python processes when you run these tools?
>
>         --nate
>
>
>
>             On Jun 7, 2010, at 9:29 AM, Nate Coraor wrote:
>
>                 Dennis Gascoigne wrote:
>
>                     I know it must be possible to make Galaxy use more
>                     of the resources available to it but I haven't been
>                     able to figure out how. If galaxy is calling python
>                     scripts, it only seems to use a single processor to
>                     do so. We have an 8 core machine and if other
>                     binaries are spawned then they get other processors
>                     no problems, but any of the python operations though
>                     seem to operate on the same processor no matter how
>                     many of these 'python based' jobs are running i.e.
>                     If i run 5 tools simultaneously, I would expect
>                     allocation to make use of all available processing
>                     resources.
>                     Is there some config to make this happen? I am sure
>                     I am missing something basic.
>
>
>                 Hi Dennis,
>
>                 This is something I've referred to on the
>                 ProductionServer wiki page, but have not yet documented.
>                  I also covered it in my Developer Conference talk
>                 (slides: http://usegalaxy.org/dev2010 ).
>
>                 However, probably the best place to find it right now is
>                 the conversation on the list from last month between
>                 Davide Cittaro and I:
>
>                 http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-May/002634.html
>
>                 I'd suggest copying universe_wsgi.ini in both instances
>                 rather than moving in one, as I originally suggested.
>                  Otherwise, egg-checking routines will fail because of
>                 the missing config file.
>
>                 --nate
>
>                     Cheers
>                     ------------------------------------------------------------------------
>                     _______________________________________________
>                     galaxy-dev mailing list
>                     [hidden email]
>                     <mailto:[hidden email]>
>                     http://lists.bx.psu.edu/listinfo/galaxy-dev
>
>
>                 _______________________________________________
>                 galaxy-dev mailing list
>                 [hidden email]
>                 <mailto:[hidden email]>
>                 http://lists.bx.psu.edu/listinfo/galaxy-dev
>
>
>
>
>
>
>     --
>     Dennis Gascoigne
>     0407 639 995
>     [hidden email] <mailto:[hidden email]>
>
>
>
>
> --
> Dennis Gascoigne
> 0407 639 995
> [hidden email] <mailto:[hidden email]>

_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Make Galaxy use multiple processors

Nate Coraor (nate@bx.psu.edu)
In reply to this post by Dennis Gascoigne-2
Dennis Gascoigne wrote:
> Just a clarification
>
> What designates how many cores are used? The number of web runners, or
> the number of job runners or something else? I understand that at the
> moment, you can only have one job runner so if it's the job runner can I
> actually have more than 1 core utilized? I have 2 web runners, and one
> job runner, but the demand has not yet warranted kicking in the 2nd web
> runner - but it will soon.

CPython has a thread lock (known formally as the Global Interpreter
Lock) which causes only one thread to have control of the process at a
time.  This means that at most, only one core will ever be used by each
Python process.  The number of cores used is going to be the number of
active processes.

On a server which runs jobs on a cluster, that means just the web
process(es) and the job runner process.  On a server which runs jobs
locally, it's the number of web process(es), the job runner, and the
value of 'local_job_queue_workers' in the job runner's config file.

--nate

>
> On Mon, Jun 7, 2010 at 11:35 PM, Nate Coraor <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     James Taylor wrote:
>
>         Nate, shouldn't python tools run as different processes scale
>         out to multiple processors without any special configuration
>         though? E.g. running 5 gops jobs in parallel should use multiple
>         processes. Is there something that prevents this?
>
>
>     Oh, yes, I suppose I failed to read this email properly.  It should
>     definitely be starting separate processes, which would remove the
>     GIL's one core limitation.  Dennis, are you seeing multiple python
>     processes when you run these tools?
>
>     --nate
>
>
>
>         On Jun 7, 2010, at 9:29 AM, Nate Coraor wrote:
>
>             Dennis Gascoigne wrote:
>
>                 I know it must be possible to make Galaxy use more of
>                 the resources available to it but I haven't been able to
>                 figure out how. If galaxy is calling python scripts, it
>                 only seems to use a single processor to do so. We have
>                 an 8 core machine and if other binaries are spawned then
>                 they get other processors no problems, but any of the
>                 python operations though seem to operate on the same
>                 processor no matter how many of these 'python based'
>                 jobs are running i.e. If i run 5 tools simultaneously, I
>                 would expect allocation to make use of all available
>                 processing resources.
>                 Is there some config to make this happen? I am sure I am
>                 missing something basic.
>
>
>             Hi Dennis,
>
>             This is something I've referred to on the ProductionServer
>             wiki page, but have not yet documented.  I also covered it
>             in my Developer Conference talk (slides:
>             http://usegalaxy.org/dev2010 ).
>
>             However, probably the best place to find it right now is the
>             conversation on the list from last month between Davide
>             Cittaro and I:
>
>             http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-May/002634.html
>
>             I'd suggest copying universe_wsgi.ini in both instances
>             rather than moving in one, as I originally suggested.
>              Otherwise, egg-checking routines will fail because of the
>             missing config file.
>
>             --nate
>
>                 Cheers
>                 ------------------------------------------------------------------------
>                 _______________________________________________
>                 galaxy-dev mailing list
>                 [hidden email]
>                 <mailto:[hidden email]>
>                 http://lists.bx.psu.edu/listinfo/galaxy-dev
>
>
>             _______________________________________________
>             galaxy-dev mailing list
>             [hidden email] <mailto:[hidden email]>
>             http://lists.bx.psu.edu/listinfo/galaxy-dev
>
>
>
>
>
>
> --
> Dennis Gascoigne
> 0407 639 995
> [hidden email] <mailto:[hidden email]>

_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev