The new hg based Galaxy Tool Shed

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

The new hg based Galaxy Tool Shed

Peter Cock
Hi Greg et al,

I've just been looking over your slides from last week about the new
'Galaxy Tool Shed', which are posted online here:

http://wiki.g2.bx.psu.edu/GCC2011

http://wiki.g2.bx.psu.edu/GCC2011?action=AttachFile&do=get&target=GalaxyToolShed.pdf

They talk about how you will be tracking individual tools in hg repositories.

I can see two ways this might work:

(1) Each of these tool specific repositories (or branches if you just make one
repository for each tool owner) would be a full fork of the Galaxy code base.
This allows in principle tools to include changes to core functionality (but
that seems dangerous due to potential merge clashes), and any existing
tool contributor's pre-existing hg forks on bitbucket might be reused.

(2) Each of these tool specific repositories would ONLY track the tool specific
files you'd add to Galaxy to install the tool. So, typically there would be an
XML file, perhaps a wrapper script, maybe a sample loc file, and a plain
text readme file.

I'm guessing you've gone for something along the lines of idea (2), but I
would love to hear more about how this will all work. e.g. Where would
the tool shed repositories be hosted, and would tool authors use hg to
work with them, or something like the current web based tool upload?

Regards,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: The new hg based Galaxy Tool Shed

Nate Coraor (nate@bx.psu.edu)
Hi Peter,

Greg will probably reply, but I'll throw in my $0.02 as well.

Peter Cock wrote:

> Hi Greg et al,
>
> I've just been looking over your slides from last week about the new
> 'Galaxy Tool Shed', which are posted online here:
>
> http://wiki.g2.bx.psu.edu/GCC2011
>
> http://wiki.g2.bx.psu.edu/GCC2011?action=AttachFile&do=get&target=GalaxyToolShed.pdf
>
> They talk about how you will be tracking individual tools in hg repositories.
>
> I can see two ways this might work:
>
> (1) Each of these tool specific repositories (or branches if you just make one
> repository for each tool owner) would be a full fork of the Galaxy code base.
> This allows in principle tools to include changes to core functionality (but
> that seems dangerous due to potential merge clashes), and any existing
> tool contributor's pre-existing hg forks on bitbucket might be reused.

The tool shed isn't really intended for framework changes - I would
suggest keeping these as bitbucket forks, although it would certainly be
good if we had a way to locate the list of such forks centrally.

> (2) Each of these tool specific repositories would ONLY track the tool specific
> files you'd add to Galaxy to install the tool. So, typically there would be an
> XML file, perhaps a wrapper script, maybe a sample loc file, and a plain
> text readme file.
>
> I'm guessing you've gone for something along the lines of idea (2), but I

Yep.

> would love to hear more about how this will all work. e.g. Where would
> the tool shed repositories be hosted, and would tool authors use hg to
> work with them, or something like the current web based tool upload?

They're hosted here, and you can check them out and work with them
locally as you do the Galaxy source itself, or use the new web-based
upload to upload individual files or tarballs.

Have a look at the test instance of the next-gen toolshed here if you'd
like to see how it works:

  http://testtoolshed.g2.bx.psu.edu/

Please feel free to use this as a sandbox and report any issues you find.

--nate

>
> Regards,
>
> Peter
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: The new hg based Galaxy Tool Shed

Peter Cock
On Wed, Jun 1, 2011 at 3:22 PM, Nate Coraor <[hidden email]> wrote:
> Hi Peter,
>
> Greg will probably reply, but I'll throw in my $0.02 as well.

Great - but with your answers you've triggered more questions ;)

> Peter Cock wrote:
>> Hi Greg et al,
>>
>> I've just been looking over your slides from last week about the new
>> 'Galaxy Tool Shed', which are posted online here:
>>
>> http://wiki.g2.bx.psu.edu/GCC2011
>>
>> http://wiki.g2.bx.psu.edu/GCC2011?action=AttachFile&do=get&target=GalaxyToolShed.pdf
>>
>> They talk about how you will be tracking individual tools in hg repositories.
>>
>> I can see two ways this might work:
>>
>> (1) Each of these tool specific repositories (or branches if you just make one
>> repository for each tool owner) would be a full fork of the Galaxy code base.
>> This allows in principle tools to include changes to core functionality (but
>> that seems dangerous due to potential merge clashes), and any existing
>> tool contributor's pre-existing hg forks on bitbucket might be reused.
>
> The tool shed isn't really intended for framework changes - I would
> suggest keeping these as bitbucket forks, although it would certainly be
> good if we had a way to locate the list of such forks centrally.

Well, as long as the repository is created by forking on bitbucket, then
the link existing in the bitbucket web interface.
https://bitbucket.org/galaxy/galaxy-central/descendants

>> (2) Each of these tool specific repositories would ONLY track the tool specific
>> files you'd add to Galaxy to install the tool. So, typically there would be an
>> XML file, perhaps a wrapper script, maybe a sample loc file, and a plain
>> text readme file.
>>
>> I'm guessing you've gone for something along the lines of idea (2), but I
>
> Yep.

It did seem the most likely route.

>> would love to hear more about how this will all work. e.g. Where would
>> the tool shed repositories be hosted, and would tool authors use hg to
>> work with them, or something like the current web based tool upload?
>
> They're hosted here, and you can check them out and work with them
> locally as you do the Galaxy source itself, or use the new web-based
> upload to upload individual files or tarballs.
>
> Have a look at the test instance of the next-gen toolshed here if you'd
> like to see how it works:
>
>  http://testtoolshed.g2.bx.psu.edu/
>
> Please feel free to use this as a sandbox and report any issues you find.

I see the existing usernames and passwords from the old Tool Shed were
transferred - that makes life easier. And it lists the hg information, e.g.

hg clone http://peterjc@.../repos/peterjc/venn_list
hg clone http://peterjc@.../repos/peterjc/tmhmm_and_signalp

What happens with branches? Would the Tool Shed just show the
default branch? That seems best for a simple UI.

I have a query regarding the way the tools are shown in tables and the
"version" column, which shows a changeset and revision number. According
to Greg's slides (slide #10, titled "Simpler tool versioning" which seems ironic
to me), the old numerical version is still there in the XML - and I'd prefer to
see that. How about having both shown (two columns, perhaps call them
"Public version" and "hg version" or "hg revision").

With regards to the planned installation functionality, what happens when
a tool repository (aka Tool Suite in the old model) contains several XML
wrappers - would you be able to choose which are wanted? The use case
I have here is when several tools share some common dependency (which
should be tracked in a single repository), and were therefore useful to
bundle together as a suite, but where not all the tools will be of global
interest (e.g. My TMHMM, SignalP, etc suite).

Peter

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: The new hg based Galaxy Tool Shed

Peter Cock
On Wed, Jun 1, 2011 at 4:00 PM, Peter Cock <[hidden email]> wrote:

>> Peter Cock wrote:
>>> would love to hear more about how this will all work. e.g. Where would
>>> the tool shed repositories be hosted, and would tool authors use hg to
>>> work with them, or something like the current web based tool upload?
>>
>> They're hosted here, and you can check them out and work with them
>> locally as you do the Galaxy source itself, or use the new web-based
>> upload to upload individual files or tarballs.
>>
>> Have a look at the test instance of the next-gen toolshed here if you'd
>> like to see how it works:
>>
>>  http://testtoolshed.g2.bx.psu.edu/
>>
>> Please feel free to use this as a sandbox ...

Does that mean it will be cleared as some point before taking over,
so we can make deliberate test changes without the fear of them
being applied by other Galaxy administrators? If so, please stick
a big warning on the http://testtoolshed.g2.bx.psu.edu/ test server
(e.g. replace the top left link "Galaxy Tool Shed" with "Galaxy
TESTING Tool Shed"), and ideally some text telling people to
continue to use http://community.g2.bx.psu.edu/ for production
servers.

>>
>> ... and report any issues you find.

First bug report: https://bitbucket.org/galaxy/galaxy-central/issue/564/

It seems you've making a lot of work for yourselves by reimplementing
a web GUI for an hg repository. Isn't there an existing web server thing
you would have running on http://testtoolshed.g2.bx.psu.edu/ to take
care of this side of things? Ideally something you could theme and embed
within the frames of the current Tool Shed UI.

Peter

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: The new hg based Galaxy Tool Shed

Greg Von Kuster
In reply to this post by Peter Cock
Hello Peter - I finally got a chance to jump in - see my inline comments...


On Jun 1, 2011, at 11:00 AM, Peter Cock wrote:

> On Wed, Jun 1, 2011 at 3:22 PM, Nate Coraor <[hidden email]> wrote:
>> Hi Peter,
>>
>> Greg will probably reply, but I'll throw in my $0.02 as well.
>
> Great - but with your answers you've triggered more questions ;)
>
>> Peter Cock wrote:
>>> Hi Greg et al,
>>>
>>> I've just been looking over your slides from last week about the new
>>> 'Galaxy Tool Shed', which are posted online here:
>>>
>>> http://wiki.g2.bx.psu.edu/GCC2011
>>>
>>> http://wiki.g2.bx.psu.edu/GCC2011?action=AttachFile&do=get&target=GalaxyToolShed.pdf
>>>
>>> They talk about how you will be tracking individual tools in hg repositories.
>>>
>>> I can see two ways this might work:
>>>
>>> (1) Each of these tool specific repositories (or branches if you just make one
>>> repository for each tool owner) would be a full fork of the Galaxy code base.
>>> This allows in principle tools to include changes to core functionality (but
>>> that seems dangerous due to potential merge clashes), and any existing
>>> tool contributor's pre-existing hg forks on bitbucket might be reused.
>>
>> The tool shed isn't really intended for framework changes - I would
>> suggest keeping these as bitbucket forks, although it would certainly be
>> good if we had a way to locate the list of such forks centrally.
>
> Well, as long as the repository is created by forking on bitbucket, then
> the link existing in the bitbucket web interface.
> https://bitbucket.org/galaxy/galaxy-central/descendants


What's important here is that each tool or set of tools is it's own separate entity - see the future "big picture" highlights below for reasons.


>
>>> (2) Each of these tool specific repositories would ONLY track the tool specific
>>> files you'd add to Galaxy to install the tool. So, typically there would be an
>>> XML file, perhaps a wrapper script, maybe a sample loc file, and a plain
>>> text readme file.
>>>
>>> I'm guessing you've gone for something along the lines of idea (2), but I
>>
>> Yep.
>
> It did seem the most likely route.
>
>>> would love to hear more about how this will all work. e.g. Where would
>>> the tool shed repositories be hosted, and would tool authors use hg to
>>> work with them, or something like the current web based tool upload?
>>
>> They're hosted here, and you can check them out and work with them
>> locally as you do the Galaxy source itself, or use the new web-based
>> upload to upload individual files or tarballs.
>>
>> Have a look at the test instance of the next-gen toolshed here if you'd
>> like to see how it works:
>>
>>  http://testtoolshed.g2.bx.psu.edu/
>>
>> Please feel free to use this as a sandbox and report any issues you find.
>
> I see the existing usernames and passwords from the old Tool Shed were
> transferred - that makes life easier. And it lists the hg information, e.g.
>
> hg clone http://peterjc@.../repos/peterjc/venn_list
> hg clone http://peterjc@.../repos/peterjc/tmhmm_and_signalp
>
> What happens with branches? Would the Tool Shed just show the
> default branch? That seems best for a simple UI.

Some of the branching details are yet to be worked out, but forks are easy because repository urls include the unique username of the Galaxy user.

>
> I have a query regarding the way the tools are shown in tables and the
> "version" column, which shows a changeset and revision number. According
> to Greg's slides (slide #10, titled "Simpler tool versioning" which seems ironic
> to me), the old numerical version is still there in the XML - and I'd prefer to
> see that. How about having both shown (two columns, perhaps call them
> "Public version" and "hg version" or "hg revision").


We can certainly do this, but what would you like to see for tool suites and other tool "types"?  The old Galaxy tool shed strictly required a suite_config.xml file that included the overall version of the suite.  To make tool development easier, we're no longer requiring the inclusion of a suite_config.xml file ( we don't even differentiate types of tools since everything is a repository ).  The definition of a tool in the next gen tool shed, is fairly loose.  A tool could be data, it could be an exported workflow, it could be a suite of tools, a single tool, or just a set of files.  So we'll need to define an easy way to provide a version of the tool if it will be different than the version of the repository tip.


>
> With regards to the planned installation functionality, what happens when
> a tool repository (aka Tool Suite in the old model) contains several XML
> wrappers - would you be able to choose which are wanted?

Yes - see below...

> The use case
> I have here is when several tools share some common dependency (which
> should be tracked in a single repository), and were therefore useful to
> bundle together as a suite, but where not all the tools will be of global
> interest (e.g. My TMHMM, SignalP, etc suite).



Here's the future "big picture" highlights.  Many of the details are yet to be defined and fleshed out...

We're hoping that in the near future there will be many local tool sheds ( just like Galaxy instances ).  I'm thinking that there will be a central tool shed "broker" of sorts that is hosted by the Galaxy team.  This broker will provide 2 basic functions.  It will enable local tool sheds ( including the current tool shed hosted by the Galaxy team ) to advertise their tools, and it will allow local Galaxy instances to use those advertisements to find tools that the local Galaxy instance's users are interested in.  This specific point has not yet been discussed to any depth, so consider it fluid for now.

When a Galaxy instance's admin locates tools within a specific tool shed that they want to install, they will be able to install them via a Galaxy tool installation control panel.  Think of a UI that provides a check-boxed list of tools that have been found in some tool shed or sheds. The Galaxy admin will check those tools he wants to install, and the tools, along with all dependencies will automatically be installed in the local Galaxy instance.  Dependencies could include 3rd party binaries, maybe some form of data, and other forms of dependencies.  This is another good reason to keep tools separated in their own repositories.

The installation will be virtually automatic, requiring little or no manual intervention via a "package manage" of sorts.  This will be done using a combination of fabric scripts, and other components.  All of the underlying mercurial stuff will be handled beneath the UI layer.


>
> Peter
>

Greg Von Kuster
Galaxy Development Team
[hidden email]




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: The new hg based Galaxy Tool Shed

Greg Von Kuster
In reply to this post by Peter Cock

On Jun 1, 2011, at 11:19 AM, Peter Cock wrote:

> On Wed, Jun 1, 2011 at 4:00 PM, Peter Cock <[hidden email]> wrote:
>>> Peter Cock wrote:
>>>> would love to hear more about how this will all work. e.g. Where would
>>>> the tool shed repositories be hosted, and would tool authors use hg to
>>>> work with them, or something like the current web based tool upload?
>>>
>>> They're hosted here, and you can check them out and work with them
>>> locally as you do the Galaxy source itself, or use the new web-based
>>> upload to upload individual files or tarballs.
>>>
>>> Have a look at the test instance of the next-gen toolshed here if you'd
>>> like to see how it works:
>>>
>>>  http://testtoolshed.g2.bx.psu.edu/
>>>
>>> Please feel free to use this as a sandbox ...
>
> Does that mean it will be cleared as some point before taking over,
> so we can make deliberate test changes without the fear of them
> being applied by other Galaxy administrators? If so, please stick
> a big warning on the http://testtoolshed.g2.bx.psu.edu/ test server
> (e.g. replace the top left link "Galaxy Tool Shed" with "Galaxy
> TESTING Tool Shed"), and ideally some text telling people to
> continue to use http://community.g2.bx.psu.edu/ for production
> servers.

Yes - we'll do this.  This test tool shed should be used for testing ( we'll keep it available indefinitely ) much like the Galaxy test instance we host here at Penn State.  Fell free to mess with anything you want.  Please report bugs and I'll fix them as fast as possible.  Very soon there will be a main production tool shed available at http://toolshed.g2.bx.psu.edu.


>
>>>
>>> ... and report any issues you find.
>
> First bug report: https://bitbucket.org/galaxy/galaxy-central/issue/564/
>
> It seems you've making a lot of work for yourselves by reimplementing
> a web GUI for an hg repository. Isn't there an existing web server thing
> you would have running on http://testtoolshed.g2.bx.psu.edu/ to take
> care of this side of things? Ideally something you could theme and embed
> within the frames of the current Tool Shed UI.

On my list - thanks for reporting it!


>
> Peter
>

Greg Von Kuster
Galaxy Development Team
[hidden email]




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: The new hg based Galaxy Tool Shed

Peter Cock
In reply to this post by Greg Von Kuster
On Wed, Jun 1, 2011 at 4:22 PM, Greg Von Kuster <[hidden email]> wrote:
> Hello Peter - I finally got a chance to jump in - see my inline comments...

Hi :)

>> What happens with branches? Would the Tool Shed just show the
>> default branch? That seems best for a simple UI.
>
> Some of the branching details are yet to be worked out, but forks are easy
> because repository urls include the unique username of the Galaxy user.

Well, yes and no - as long as there are competing versions of a Galaxy tool
(e.g. from an original author and a fork by a second author), and they use
the same ID in their XML, you have a clash. This will have to be considered
in the (automated) install interface. i.e. In general, when installing
or updating
any tool, there may be existing versions of some components already present.
In fact two completely unrelated tools could even have the same XML ID by
accident.

>> I have a query regarding the way the tools are shown in tables and the
>> "version" column, which shows a changeset and revision number. According
>> to Greg's slides (slide #10, titled "Simpler tool versioning" which seems ironic
>> to me), the old numerical version is still there in the XML - and I'd prefer to
>> see that. How about having both shown (two columns, perhaps call them
>> "Public version" and "hg version" or "hg revision").
>
> We can certainly do this, but what would you like to see for tool suites and
> other tool "types"?  The old Galaxy tool shed strictly required a suite_config.xml
> file that included the overall version of the suite.  To make tool development
> easier, we're no longer requiring the inclusion of a suite_config.xml file ( we
> don't even differentiate types of tools since everything is a repository ).  The
> definition of a tool in the next gen tool shed, is fairly loose.  A tool could be
> data, it could be an exported workflow, it could be a suite of tools, a single
> tool, or just a set of files.  So we'll need to define an easy way to provide a
> version of the tool if it will be different than the version of the repository tip.

I see what you mean for the "suite" case. Maybe on the view details page
each constituent tool could be shown with its "classical" version number
from the XML file?

>
> Here's the future "big picture" highlights.  Many of the details are yet to
> be defined and fleshed out...
>
> We're hoping that in the near future there will be many local tool sheds
> ( just like Galaxy instances ).  I'm thinking that there will be a central tool
> shed "broker" of sorts that is hosted by the Galaxy team.  This broker will
> provide 2 basic functions.  It will enable local tool sheds ( including the
> current tool shed hosted by the Galaxy team ) to advertise their tools,
> and it will allow local Galaxy instances to use those advertisements to
> find tools that the local Galaxy instance's users are interested in.  This
> specific point has not yet been discussed to any depth, so consider it
> fluid for now.

I'm not immediately sold on this plan. To me one of the big plus points
of having a single "Official" Tool Shed looked after by the Galaxy team
is the convenience factor (a one stop shop), which requires critical mass,
plus whatever QA happens as part of the current approval process. I
would regard it as a step backwards if in order to hunt for a wrapper for
a given tool, I had to resort to Google in order to find all the individual
Galaxy Tool Sheds.

> When a Galaxy instance's admin locates tools within a specific tool shed
> that they want to install, they will be able to install them via a Galaxy tool
> installation control panel.  Think of a UI that provides a check-boxed list
> of tools that have been found in some tool shed or sheds. The Galaxy
> admin will check those tools he wants to install, and the tools, along with
> all dependencies will automatically be installed in the local Galaxy instance.
> Dependencies could include 3rd party binaries, maybe some form of data,
> and other forms of dependencies.  This is another good reason to keep
> tools separated in their own repositories.

If you mean by "dependencies" the small task of installing the tool XML
and associated scripts and data files currently bundled in the tar balls
on the current Tool Shed, that seems fine. Anything beyond that seems
difficult and likely to impose a significant extra load on tool wrapper
authors.

> The installation will be virtually automatic, requiring little or no manual
> intervention via a "package manage" of sorts.  This will be done using
> a combination of fabric scripts, and other components.  All of the
> underlying mercurial stuff will be handled beneath the UI layer.

This larger aim of installing the underlying dependencies is impossible
in general - but that seems to be what you want to aim for. Consider
obvious use case of closed source (non-redistributable) 3rd party binaries.
I can think of several examples from the current Tool Shed wrappers,
including the Roche "Newbler" off instrument applications, TMHMM
and SignalP.

Even if you just hope to cover open source tool dependencies, this is
another big problem which seems like something Galaxy shouldn't be
taking on. Frankly the only way I expect this grand plan to have any
practical chance of success is if you limit yourselves to a single existing
Linux package management platform like RPM or Deb files (although
doing that would limit Galaxy's appeal). e.g. Work hand in hand with
Debian-Med to ensure any missing tool is covered.

Are you biting off more than you can chew? I hope I am misinterpreting
your plans.

(And for the umpteenth time, I am frustrated I couldn't make it to
the Galaxy conference last week in person - more for this kind of
discussion rather than the talks themselves. Will you be at BOSC
or ISMB 2011 in Vienna? Maybe that could be another thread...)

Regards,

Peter

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: The new hg based Galaxy Tool Shed

Nate Coraor (nate@bx.psu.edu)
Peter Cock wrote:
>
> Well, yes and no - as long as there are competing versions of a Galaxy tool
> (e.g. from an original author and a fork by a second author), and they use
> the same ID in their XML, you have a clash. This will have to be considered
> in the (automated) install interface. i.e. In general, when installing
> or updating
> any tool, there may be existing versions of some components already present.
> In fact two completely unrelated tools could even have the same XML ID by
> accident.

I agree there could be a problem with tool ID uniqueness.  We've talked
about suggesting that people namespace their tool IDs to prevent this,
but nothing formal has materialized at this point.

> I'm not immediately sold on this plan. To me one of the big plus points
> of having a single "Official" Tool Shed looked after by the Galaxy team
> is the convenience factor (a one stop shop), which requires critical mass,
> plus whatever QA happens as part of the current approval process. I
> would regard it as a step backwards if in order to hunt for a wrapper for
> a given tool, I had to resort to Google in order to find all the individual
> Galaxy Tool Sheds.

It'll be possible for people to run their own Tool Sheds if they'd like,
for whatever purpose - and this may be necessary for sharing extremely
large data which we can't possibly host at the main Shed, but there
should be an aggregator somewhere which lists all of the available
public Sheds and makes it easy to add them as new sources to your Galaxy
install.  Like a slightly more organized Debian APT system.

> If you mean by "dependencies" the small task of installing the tool XML
> and associated scripts and data files currently bundled in the tar balls
> on the current Tool Shed, that seems fine. Anything beyond that seems
> difficult and likely to impose a significant extra load on tool wrapper
> authors.

It'll be up to the authors to decide what level of complexity they care
to handle, but we want to move away from the situation where someone
installs a "tool" but finds that it's unusable because the actual
underlying dependency doesn't exist and is non-trivial to install.

> This larger aim of installing the underlying dependencies is impossible
> in general - but that seems to be what you want to aim for. Consider
> obvious use case of closed source (non-redistributable) 3rd party binaries.
> I can think of several examples from the current Tool Shed wrappers,
> including the Roche "Newbler" off instrument applications, TMHMM
> and SignalP.

Agreed, thankfully, the current dependency system (tool_dependency_dir
in the config file (not in the sample config, sorry, I'll rememdy that
shortly!)) only requires that you have an environment file that
configures whatever is necessary (generally just $PATH) to find a
dependency.  So the tools in the Tool Shed would provide the XML,
wrapper script (if necessary), and then instructions or perhaps an
interface to configure the env file.

> Even if you just hope to cover open source tool dependencies, this is
> another big problem which seems like something Galaxy shouldn't be
> taking on. Frankly the only way I expect this grand plan to have any
> practical chance of success is if you limit yourselves to a single existing
> Linux package management platform like RPM or Deb files (although
> doing that would limit Galaxy's appeal). e.g. Work hand in hand with
> Debian-Med to ensure any missing tool is covered.

Distributing binaries for the core platforms (Linux i686/x86_64) and Mac
OS X is probably not terribly difficult for us, but would be more work
for for 3rd party developers - but the choice to do this is up to them.
I also haven't given too much though about how this would work.  dpkg
and rpm have the upside of being deterministic, but the downside of
being platform-specific, requiring root, and not having much ability to
install to varying paths.

A fallback to source if binaries are not available would also be nice,
if it's possible to write some easy instructions on how to compile, but
of course this won't always be the case.

> Are you biting off more than you can chew? I hope I am misinterpreting
> your plans.

Hopefully not!  We're trying to think this through pretty thoroughly
before we get started, thanks for joining in the discussion. =)

> (And for the umpteenth time, I am frustrated I couldn't make it to
> the Galaxy conference last week in person - more for this kind of
> discussion rather than the talks themselves. Will you be at BOSC
> or ISMB 2011 in Vienna? Maybe that could be another thread...)

Agreed!  I do believe there are some people going to BOSC, Dave will
hopefully chime in with the details (when he's awake, I think he was
only flying back today).

--nate

>
> Regards,
>
> Peter
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: The new hg based Galaxy Tool Shed

Peter Cock
On Wed, Jun 1, 2011 at 5:25 PM, Nate Coraor <[hidden email]> wrote:

> Peter Cock wrote:
>>
>> Well, yes and no - as long as there are competing versions of a Galaxy tool
>> (e.g. from an original author and a fork by a second author), and they use
>> the same ID in their XML, you have a clash. This will have to be considered
>> in the (automated) install interface. i.e. In general, when installing or
>> updating any tool, there may be existing versions of some components
>>  already present. In fact two completely unrelated tools could even have
>> the same XML ID by accident.
>
> I agree there could be a problem with tool ID uniqueness.  We've talked
> about suggesting that people namespace their tool IDs to prevent this,
> but nothing formal has materialized at this point.

That sounds sensible, and the sooner the better.

>> I'm not immediately sold on this plan. To me one of the big plus points
>> of having a single "Official" Tool Shed looked after by the Galaxy team
>> is the convenience factor (a one stop shop), which requires critical mass,
>> plus whatever QA happens as part of the current approval process. I
>> would regard it as a step backwards if in order to hunt for a wrapper for
>> a given tool, I had to resort to Google in order to find all the individual
>> Galaxy Tool Sheds.
>
> It'll be possible for people to run their own Tool Sheds if they'd like,
> for whatever purpose - and this may be necessary for sharing extremely
> large data which we can't possibly host at the main Shed, but there
> should be an aggregator somewhere which lists all of the available
> public Sheds and makes it easy to add them as new sources to your Galaxy
> install.  Like a slightly more organized Debian APT system.

If there is an official "meta tool shed" aggregator, that would address
my main concern about fragmenting things.

>> If you mean by "dependencies" the small task of installing the tool XML
>> and associated scripts and data files currently bundled in the tar balls
>> on the current Tool Shed, that seems fine. Anything beyond that seems
>> difficult and likely to impose a significant extra load on tool wrapper
>> authors.
>
> It'll be up to the authors to decide what level of complexity they care
> to handle,

Good - that silences a lot of my worries.

> ... but we want to move away from the situation where someone
> installs a "tool" but finds that it's unusable because the actual
> underlying dependency doesn't exist and is non-trivial to install.

Improving the documentation shown on the tool shed could help here -
make it easier for the tool wrapper to tell the Tool Shed user what will
be required.

Currently we get a short plain text box as part of the upload (no markup),
and can include a (plain text) readme file which is easily viewable from
the tool shed. I've just filed an enhancement request on a related idea:

https://bitbucket.org/galaxy/galaxy-central/issue/565/
Show mockup of tool GUI in Galaxy Tool Shed

>> This larger aim of installing the underlying dependencies is impossible
>> in general - but that seems to be what you want to aim for. Consider
>> obvious use case of closed source (non-redistributable) 3rd party binaries.
>> I can think of several examples from the current Tool Shed wrappers,
>> including the Roche "Newbler" off instrument applications, TMHMM
>> and SignalP.
>
> Agreed, thankfully, the current dependency system (tool_dependency_dir
> in the config file (not in the sample config, sorry, I'll rememdy that
> shortly!)) only requires that you have an environment file that
> configures whatever is necessary (generally just $PATH) to find a
> dependency.  So the tools in the Tool Shed would provide the XML,
> wrapper script (if necessary), and then instructions or perhaps an
> interface to configure the env file.

I'd hope the common case where all that is required is the tool binary
to be on the path, would not require any extra configuration files. See
also: https://bitbucket.org/galaxy/galaxy-central/issue/82

> [cut]
>
>> Are you biting off more than you can chew? I hope I am misinterpreting
>> your plans.
>
> Hopefully not!  We're trying to think this through pretty thoroughly
> before we get started, thanks for joining in the discussion. =)

I've been reassured :)

Peter

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: The new hg based Galaxy Tool Shed

Fields, Christopher J
(apologies in advance, limiting my response to the two questions below)

On Jun 1, 2011, at 11:54 AM, Peter Cock wrote:

> On Wed, Jun 1, 2011 at 5:25 PM, Nate Coraor <[hidden email]> wrote:
>> Peter Cock wrote:
>>>
>>> Well, yes and no - as long as there are competing versions of a Galaxy tool
>>> (e.g. from an original author and a fork by a second author), and they use
>>> the same ID in their XML, you have a clash. This will have to be considered
>>> in the (automated) install interface. i.e. In general, when installing or
>>> updating any tool, there may be existing versions of some components
>>> already present. In fact two completely unrelated tools could even have
>>> the same XML ID by accident.
>>
>> I agree there could be a problem with tool ID uniqueness.  We've talked
>> about suggesting that people namespace their tool IDs to prevent this,
>> but nothing formal has materialized at this point.
>
> That sounds sensible, and the sooner the better.

Agreed.  I think simple namespace prefixes (maybe hg account?) is the easiest option.

>>> I'm not immediately sold on this plan. To me one of the big plus points
>>> of having a single "Official" Tool Shed looked after by the Galaxy team
>>> is the convenience factor (a one stop shop), which requires critical mass,
>>> plus whatever QA happens as part of the current approval process. I
>>> would regard it as a step backwards if in order to hunt for a wrapper for
>>> a given tool, I had to resort to Google in order to find all the individual
>>> Galaxy Tool Sheds.
>>
>> It'll be possible for people to run their own Tool Sheds if they'd like,
>> for whatever purpose - and this may be necessary for sharing extremely
>> large data which we can't possibly host at the main Shed, but there
>> should be an aggregator somewhere which lists all of the available
>> public Sheds and makes it easy to add them as new sources to your Galaxy
>> install.  Like a slightly more organized Debian APT system.
>
> If there is an official "meta tool shed" aggregator, that would address
> my main concern about fragmenting things.

Not sure how feasible this is, but could you use hg subrepositories for this purpose?  For instance, have a 'blessed' set of galaxy tool sheds (as subrepos) listed in a main tool shed repository.  One of the nice advantages of this is it could allow one to use git or svn, though I think sticking with hg-only repos is the simplest option for now.

chris

PS - wonderful conference, sorry that Peter couldn't make it!


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: The new hg based Galaxy Tool Shed

Nate Coraor (nate@bx.psu.edu)
In reply to this post by Peter Cock
Peter Cock wrote:
>
> If there is an official "meta tool shed" aggregator, that would address
> my main concern about fragmenting things.

If nothing else, there can be a wiki page, although something
programatic would be more ideal.

> > ... but we want to move away from the situation where someone
> > installs a "tool" but finds that it's unusable because the actual
> > underlying dependency doesn't exist and is non-trivial to install.
>
> Improving the documentation shown on the tool shed could help here -
> make it easier for the tool wrapper to tell the Tool Shed user what will
> be required.
>
> Currently we get a short plain text box as part of the upload (no markup),
> and can include a (plain text) readme file which is easily viewable from
> the tool shed. I've just filed an enhancement request on a related idea:
>
> https://bitbucket.org/galaxy/galaxy-central/issue/565/
> Show mockup of tool GUI in Galaxy Tool Shed

Yeah, eventually we'll have to parse the tool configs in the repo, so
functionality like this should show up as the Shed matures.  Not sure
about the difficulty of doing the tool form mockup, but I like the idea.

> >> This larger aim of installing the underlying dependencies is impossible
> >> in general - but that seems to be what you want to aim for. Consider
> >> obvious use case of closed source (non-redistributable) 3rd party binaries.
> >> I can think of several examples from the current Tool Shed wrappers,
> >> including the Roche "Newbler" off instrument applications, TMHMM
> >> and SignalP.
> >
> > Agreed, thankfully, the current dependency system (tool_dependency_dir
> > in the config file (not in the sample config, sorry, I'll rememdy that
> > shortly!)) only requires that you have an environment file that
> > configures whatever is necessary (generally just $PATH) to find a
> > dependency.  So the tools in the Tool Shed would provide the XML,
> > wrapper script (if necessary), and then instructions or perhaps an
> > interface to configure the env file.
>
> I'd hope the common case where all that is required is the tool binary
> to be on the path, would not require any extra configuration files. See
> also: https://bitbucket.org/galaxy/galaxy-central/issue/82

Well, use of the dependency system isn't required, so just setting
things up on the $PATH is always a possibility.  I was going to suggest
that your patch could be applied if it was conditional on the local
runner and checked after any <requirement type="package"> dependencies
were setup, but there's still the problem of people running jobs through
the local runner which are actually sent to the cluster without Galaxy's
knowledge.  Perhaps this is something we shouldn't worry too much about,
but I know there are people doing it.

--nate

>
> > [cut]
> >
> >> Are you biting off more than you can chew? I hope I am misinterpreting
> >> your plans.
> >
> > Hopefully not!  We're trying to think this through pretty thoroughly
> > before we get started, thanks for joining in the discussion. =)
>
> I've been reassured :)
>
> Peter
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: The new hg based Galaxy Tool Shed

Peter Cock
On Wednesday, June 1, 2011, Nate Coraor <[hidden email]> wrote:

> Peter Cock wrote:
>
>> > ... but we want to move away from the situation where someone
>> > installs a "tool" but finds that it's unusable because the actual
>> > underlying dependency doesn't exist and is non-trivial to install.
>>
>> Improving the documentation shown on the tool shed could help here -
>> make it easier for the tool wrapper to tell the Tool Shed user what will
>> be required.
>>
>> Currently we get a short plain text box as part of the upload (no markup),
>> and can include a (plain text) readme file which is easily viewable from
>> the tool shed. I've just filed an enhancement request on a related idea:
>>
>> https://bitbucket.org/galaxy/galaxy-central/issue/565/
>> Show mockup of tool GUI in Galaxy Tool Shed
>
> Yeah, eventually we'll have to parse the tool configs in the repo, so
> functionality like this should show up as the Shed matures.  Not sure
> about the difficulty of doing the tool form mockup, but I like the idea.

That's a start :)

>> >> This larger aim of installing the underlying dependencies is impossible
>> >> in general - but that seems to be what you want to aim for. Consider
>> >> obvious use case of closed source (non-redistributable) 3rd party binaries.
>> >> I can think of several examples from the current Tool Shed wrappers,
>> >> including the Roche "Newbler" off instrument applications, TMHMM
>> >> and SignalP.
>> >
>> > Agreed, thankfully, the current dependency system (tool_dependency_dir
>> > in the config file (not in the sample config, sorry, I'll rememdy that
>> > shortly!)) only requires that you have an environment file that
>> > configures whatever is necessary (generally just $PATH) to find a
>> > dependency.  So the tools in the Tool Shed would provide the XML,
>> > wrapper script (if necessary), and then instructions or perhaps an
>> > interface to configure the env file.
>>
>> I'd hope the common case where all that is required is the tool binary
>> to be on the path, would not require any extra configuration files. See
>> also: https://bitbucket.org/galaxy/galaxy-central/issue/82
>
> Well, use of the dependency system isn't required, so just setting
> things up on the $PATH is always a possibility.  I was going to suggest
> that your patch could be applied if it was conditional on the local
> runner and checked after any <requirement type="package">
> dependencies were setup, ...

Is that a request for me to update the patch? I've not delved into the
job runner code before, so it might take me a bit longer that it would
take you. Hint hint ;) I'd help with testing though.

> ... but there's still the problem of people running jobs through
> the local runner which are actually sent to the cluster without Galaxy's
> knowledge.  Perhaps this is something we shouldn't worry too much about,
> but I know there are people doing it.

You mean if Galaxy blindly calls a tool or script, and that script
then submits the job to the cluster? I'd say checking the cluster
dependencies there was the tool author's responsibility.

Peter

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: The new hg based Galaxy Tool Shed

Nate Coraor (nate@bx.psu.edu)
Peter Cock wrote:

> >
> > Well, use of the dependency system isn't required, so just setting
> > things up on the $PATH is always a possibility.  I was going to suggest
> > that your patch could be applied if it was conditional on the local
> > runner and checked after any <requirement type="package">
> > dependencies were setup, ...
>
> Is that a request for me to update the patch? I've not delved into the
> job runner code before, so it might take me a bit longer that it would
> take you. Hint hint ;) I'd help with testing though.

It's not a completely trivial thing, which is why I didn't do it at the
time.  It's probably something that should be added to the DRM wrapper
script so that a nice error message can be supplied.  I can't think of a
way to check at tool load that wouldn't be painfully slow.

> > ... but there's still the problem of people running jobs through
> > the local runner which are actually sent to the cluster without Galaxy's
> > knowledge.  Perhaps this is something we shouldn't worry too much about,
> > but I know there are people doing it.
>
> You mean if Galaxy blindly calls a tool or script, and that script
> then submits the job to the cluster? I'd say checking the cluster
> dependencies there was the tool author's responsibility.

Yeah, that's the idea.  Unfortunately, if the binary isn't installed on
the Galaxy server (which is irrelevant), the tool won't load, which is
certainly not what we want.

--nate

>
> Peter
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

The new hg based Galaxy Tool Shed

Peter Cock
In reply to this post by Fields, Christopher J
On Wednesday, June 1, 2011, Chris Fields <[hidden email]> wrote:

> (apologies in advance, limiting my response to the two questions below)
>
> On Jun 1, 2011, at 11:54 AM, Peter Cock wrote:
>
>> On Wed, Jun 1, 2011 at 5:25 PM, Nate Coraor <[hidden email]> wrote:
>>> Peter Cock wrote:
>>>>
>>>> Well, yes and no - as long as there are competing versions of a Galaxy tool
>>>> (e.g. from an original author and a fork by a second author), and they use
>>>> the same ID in their XML, you have a clash. This will have to be considered
>>>> in the (automated) install interface. i.e. In general, when installing or
>>>> updating any tool, there may be existing versions of some components
>>>> already present. In fact two completely unrelated tools could even have
>>>> the same XML ID by accident.
>>>
>>> I agree there could be a problem with tool ID uniqueness.  We've talked
>>> about suggesting that people namespace their tool IDs to prevent this,
>>> but nothing formal has materialized at this point.
>>
>> That sounds sensible, and the sooner the better.
>
> Agreed.  I think simple namespace prefixes (maybe hg account?) is the
> easiest option.

That sounds good - although I'd suggest the group's name might be a
valid alternative - then an underscore or hyphen, and the tool specific
ID which would typically be based on the name of the tool being wrapped.

If it were up to me I'd go further and recommend a restricted set of
characters (e.g. Alphanumeric and one of hyphen and underscore),
with the additional recommendation that the tool's XML filename
follows suit. e.g. signalp.xml with ID peterjc-signalp

Obviously we'd have to have a "grandfather clause" exemption for
all tools to date because changing their ID would break saved
workflows.

As an aside, I regret including the word "wrapper" in the NCBI
BLAST+ wrappers since most Galaxy tools are just wrappers
around existing tools, but it's done now.

Peter

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: The new hg based Galaxy Tool Shed

Ravi Madduri
In reply to this post by Nate Coraor (nate@bx.psu.edu)
I apologize for jumping on to this thread a bit late. I read below that there is a plan to pull tools into a galaxy installation automagically. I wonder if you plan on providing some kind of API to query the tool registry and discover the tools and install them into an existing galaxy installation.

PS: The link : How to upload, download and install tools under Help seems to be broken.
On Jun 1, 2011, at 3:00 PM, Nate Coraor wrote:

Peter Cock wrote:

Well, use of the dependency system isn't required, so just setting
things up on the $PATH is always a possibility.  I was going to suggest
that your patch could be applied if it was conditional on the local
runner and checked after any <requirement type="package">
dependencies were setup, ...

Is that a request for me to update the patch? I've not delved into the
job runner code before, so it might take me a bit longer that it would
take you. Hint hint ;) I'd help with testing though.

It's not a completely trivial thing, which is why I didn't do it at the
time.  It's probably something that should be added to the DRM wrapper
script so that a nice error message can be supplied.  I can't think of a
way to check at tool load that wouldn't be painfully slow.

... but there's still the problem of people running jobs through
the local runner which are actually sent to the cluster without Galaxy's
knowledge.  Perhaps this is something we shouldn't worry too much about,
but I know there are people doing it.

You mean if Galaxy blindly calls a tool or script, and that script
then submits the job to the cluster? I'd say checking the cluster
dependencies there was the tool author's responsibility.

Yeah, that's the idea.  Unfortunately, if the binary isn't installed on
the Galaxy server (which is irrelevant), the tool won't load, which is
certainly not what we want.

--nate


Peter

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

--
Ravi K Madduri
The Globus Alliance | Argonne National Laboratory | University of Chicago


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: The new hg based Galaxy Tool Shed

Peter Cock
On Thu, Jun 16, 2011 at 3:00 AM, Ravi Madduri <[hidden email]> wrote:
> I apologize for jumping on to this thread a bit late. I read below that
> there is a plan to pull tools into a galaxy installation automagically. I
> wonder if you plan on providing some kind of API to query the tool registry
> and discover the tools and install them into an existing galaxy
> installation.

Yes, have a look at Greg's slides from the Galaxy Community Conference
http://wiki.g2.bx.psu.edu/GCC2011

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/