Galaxy NGS Tools

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Galaxy NGS Tools

Jacob Biesinger-2
Hi!

In our lab, we've worked on several ChipSeq projects and have developed dozens of scripts for analyzing the results, including running several peak finders, several motif discovery tools, various data munging techniques, some parameter optimization for the above programs, calculating the genomic distribution of peaks, generating several summary graphs, calculating motif distributions within peaks, performing gene ontology analysis, etc.

We've been thinking about making all of this into a standalone tool, possibly a web service, and have been considering Galaxy as a vehicle for automating the entire process and opening up the tools to a biologist community.  From what I've seen in Galaxy Main and the recent inclusion of e.g., the MACS wrapper, it seems like the things I've listed would be of interest to the galaxy community at large.

So I'm looking for feedback and possibly advice.  Ideally, we'd like to be able to run the entire pipeline, look at the results, possibly change a few parameters in some of the steps (e.g., minimum FDR cutoff) and rerun only what needs to be rerun.  Galaxy workflows are easy to create, but don't seem to have the flexibility that we're looking for.  Perhaps several workflows tied together would do the job (i.e., have separate workflows for the major parts of the analysis) which we could tie together (possible in galaxy?) into one uber-pipeline.

Has work like this already been done?  Are there sample workflows that go beyond just calling peaks?  Would the community be interested in the code + wrappers?  

Thanks for the help!
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587

_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
Reply | Threaded
Open this post in threaded view
|

Fwd: Galaxy NGS Tools

Jacob Biesinger
Hi!

In our lab, we've worked on several ChipSeq projects and have developed dozens of scripts for analyzing the results, including running several peak finders, several motif discovery tools, various data munging techniques, some parameter optimization for the above programs, calculating the genomic distribution of peaks, generating several summary graphs, calculating motif distributions within peaks, performing gene ontology analysis, etc.

We've been thinking about making all of this into a standalone tool, possibly a web service, and have been considering Galaxy as a vehicle for automating the entire process and opening up the tools to a biologist community.  From what I've seen in Galaxy Main and the recent inclusion of e.g., the MACS wrapper, it seems like the things I've listed would be of interest to the galaxy community at large.

So I'm looking for feedback and possibly advice.  Ideally, we'd like to be able to run the entire pipeline, look at the results, possibly change a few parameters in some of the steps (e.g., minimum FDR cutoff) and rerun only what needs to be rerun.  Galaxy workflows are easy to create, but don't seem to have the flexibility that we're looking for.  Perhaps several workflows tied together would do the job (i.e., have separate workflows for the major parts of the analysis) which we could tie together (possible in galaxy?) into one uber-pipeline.

Has work like this already been done?  Are there sample workflows that go beyond just calling peaks?  Would the community be interested in the code + wrappers?  

Thanks for the help!
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587


_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Galaxy NGS Tools

Anton Nekrutenko
Jake:

Thank for your e-mail. There has been work in this domain. Some from Galaxy team but one of most impressive examples is Citrome project at Harvard (http://cistrome.dfci.harvard.edu/ap/), which uses Galaxy as the underlying framework. Our group and the community are very much interested in your code+wrappers. If you already tried to port tool to Galaxy, these can be submitted to our very new community site at http://usegalaxy.org/community

Speaking of flexibility in Galaxy workflows we are actively working on improving workflow functionality and if you looked at the workflows recently you might have noticed workflow actions and more is coming.

The bottom line -> Galaxy community needs you tools = wrap, test, and submit!

Thanks,

anton
galaxy team



On Jul 21, 2010, at 3:51 PM, Jacob Biesinger wrote:

> Hi!
>
> In our lab, we've worked on several ChipSeq projects and have developed dozens of scripts for analyzing the results, including running several peak finders, several motif discovery tools, various data munging techniques, some parameter optimization for the above programs, calculating the genomic distribution of peaks, generating several summary graphs, calculating motif distributions within peaks, performing gene ontology analysis, etc.
>
> We've been thinking about making all of this into a standalone tool, possibly a web service, and have been considering Galaxy as a vehicle for automating the entire process and opening up the tools to a biologist community.  From what I've seen in Galaxy Main and the recent inclusion of e.g., the MACS wrapper, it seems like the things I've listed would be of interest to the galaxy community at large.
>
> So I'm looking for feedback and possibly advice.  Ideally, we'd like to be able to run the entire pipeline, look at the results, possibly change a few parameters in some of the steps (e.g., minimum FDR cutoff) and rerun only what needs to be rerun.  Galaxy workflows are easy to create, but don't seem to have the flexibility that we're looking for.  Perhaps several workflows tied together would do the job (i.e., have separate workflows for the major parts of the analysis) which we could tie together (possible in galaxy?) into one uber-pipeline.
>
> Has work like this already been done?  Are there sample workflows that go beyond just calling peaks?  Would the community be interested in the code + wrappers?  
>
> Thanks for the help!
> --
> Jake Biesinger
> Graduate Student
> Xie Lab, UC Irvine
> (949) 231-7587
>
> _______________________________________________
> galaxy-dev mailing list
> [hidden email]
> http://lists.bx.psu.edu/listinfo/galaxy-dev

Anton Nekrutenko
http://nekrut.bx.psu.edu
http://usegalaxy.org




_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Galaxy NGS Tools

Jacob Biesinger
I see some great progress on the cistrome project.  What a shame that they haven't open-sourced their efforts.

We've only just started porting and wrapping our code for Galaxy.  One possible limiting factor is that a good portion of our code depends on the pygr package for python in order to extract sequence and perform genomic queries quickly.  For the community, would this be too tall of an order to maintain?

Thanks again.
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587


On Wed, Jul 21, 2010 at 1:01 PM, Anton Nekrutenko <[hidden email]> wrote:
Jake:

Thank for your e-mail. There has been work in this domain. Some from Galaxy team but one of most impressive examples is Citrome project at Harvard (http://cistrome.dfci.harvard.edu/ap/), which uses Galaxy as the underlying framework. Our group and the community are very much interested in your code+wrappers. If you already tried to port tool to Galaxy, these can be submitted to our very new community site at http://usegalaxy.org/community

Speaking of flexibility in Galaxy workflows we are actively working on improving workflow functionality and if you looked at the workflows recently you might have noticed workflow actions and more is coming.

The bottom line -> Galaxy community needs you tools = wrap, test, and submit!

Thanks,

anton
galaxy team



On Jul 21, 2010, at 3:51 PM, Jacob Biesinger wrote:

> Hi!
>
> In our lab, we've worked on several ChipSeq projects and have developed dozens of scripts for analyzing the results, including running several peak finders, several motif discovery tools, various data munging techniques, some parameter optimization for the above programs, calculating the genomic distribution of peaks, generating several summary graphs, calculating motif distributions within peaks, performing gene ontology analysis, etc.
>
> We've been thinking about making all of this into a standalone tool, possibly a web service, and have been considering Galaxy as a vehicle for automating the entire process and opening up the tools to a biologist community.  From what I've seen in Galaxy Main and the recent inclusion of e.g., the MACS wrapper, it seems like the things I've listed would be of interest to the galaxy community at large.
>
> So I'm looking for feedback and possibly advice.  Ideally, we'd like to be able to run the entire pipeline, look at the results, possibly change a few parameters in some of the steps (e.g., minimum FDR cutoff) and rerun only what needs to be rerun.  Galaxy workflows are easy to create, but don't seem to have the flexibility that we're looking for.  Perhaps several workflows tied together would do the job (i.e., have separate workflows for the major parts of the analysis) which we could tie together (possible in galaxy?) into one uber-pipeline.
>
> Has work like this already been done?  Are there sample workflows that go beyond just calling peaks?  Would the community be interested in the code + wrappers?
>
> Thanks for the help!
> --
> Jake Biesinger
> Graduate Student
> Xie Lab, UC Irvine
> (949) 231-7587
>
> _______________________________________________
> galaxy-dev mailing list
> [hidden email]
> http://lists.bx.psu.edu/listinfo/galaxy-dev

Anton Nekrutenko
http://nekrut.bx.psu.edu
http://usegalaxy.org





_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Galaxy NGS Tools

Nate Coraor (nate@bx.psu.edu)
Jacob Biesinger wrote:
> I see some great progress on the cistrome project.  What a shame that
> they haven't open-sourced their efforts.
>
> We've only just started porting and wrapping our code for Galaxy.  One
> possible limiting factor is that a good portion of our code depends on
> the pygr package for python in order to extract sequence and perform
> genomic queries quickly.  For the community, would this be too tall of
> an order to maintain?

Hi Jake,

We have quite a few tools that depend on outside Python modules,
although none on pygr.  Regardless, unless it's exceptionally difficult
to install, this will be fine.

--nate

>
> Thanks again.
> --
> Jake Biesinger
> Graduate Student
> Xie Lab, UC Irvine
> (949) 231-7587
>
>
> On Wed, Jul 21, 2010 at 1:01 PM, Anton Nekrutenko <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Jake:
>
>     Thank for your e-mail. There has been work in this domain. Some from
>     Galaxy team but one of most impressive examples is Citrome project
>     at Harvard (http://cistrome.dfci.harvard.edu/ap/), which uses Galaxy
>     as the underlying framework. Our group and the community are very
>     much interested in your code+wrappers. If you already tried to port
>     tool to Galaxy, these can be submitted to our very new community
>     site at http://usegalaxy.org/community
>
>     Speaking of flexibility in Galaxy workflows we are actively working
>     on improving workflow functionality and if you looked at the
>     workflows recently you might have noticed workflow actions and more
>     is coming.
>
>     The bottom line -> Galaxy community needs you tools = wrap, test,
>     and submit!
>
>     Thanks,
>
>     anton
>     galaxy team
>
>
>
>     On Jul 21, 2010, at 3:51 PM, Jacob Biesinger wrote:
>
>      > Hi!
>      >
>      > In our lab, we've worked on several ChipSeq projects and have
>     developed dozens of scripts for analyzing the results, including
>     running several peak finders, several motif discovery tools, various
>     data munging techniques, some parameter optimization for the above
>     programs, calculating the genomic distribution of peaks, generating
>     several summary graphs, calculating motif distributions within
>     peaks, performing gene ontology analysis, etc.
>      >
>      > We've been thinking about making all of this into a standalone
>     tool, possibly a web service, and have been considering Galaxy as a
>     vehicle for automating the entire process and opening up the tools
>     to a biologist community.  From what I've seen in Galaxy Main and
>     the recent inclusion of e.g., the MACS wrapper, it seems like the
>     things I've listed would be of interest to the galaxy community at
>     large.
>      >
>      > So I'm looking for feedback and possibly advice.  Ideally, we'd
>     like to be able to run the entire pipeline, look at the results,
>     possibly change a few parameters in some of the steps (e.g., minimum
>     FDR cutoff) and rerun only what needs to be rerun.  Galaxy workflows
>     are easy to create, but don't seem to have the flexibility that
>     we're looking for.  Perhaps several workflows tied together would do
>     the job (i.e., have separate workflows for the major parts of the
>     analysis) which we could tie together (possible in galaxy?) into one
>     uber-pipeline.
>      >
>      > Has work like this already been done?  Are there sample workflows
>     that go beyond just calling peaks?  Would the community be
>     interested in the code + wrappers?
>      >
>      > Thanks for the help!
>      > --
>      > Jake Biesinger
>      > Graduate Student
>      > Xie Lab, UC Irvine
>      > (949) 231-7587
>      >
>      > _______________________________________________
>      > galaxy-dev mailing list
>      > [hidden email] <mailto:[hidden email]>
>      > http://lists.bx.psu.edu/listinfo/galaxy-dev
>
>     Anton Nekrutenko
>     http://nekrut.bx.psu.edu
>     http://usegalaxy.org
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> galaxy-dev mailing list
> [hidden email]
> http://lists.bx.psu.edu/listinfo/galaxy-dev

_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Galaxy NGS Tools

Jeremy Goecks
In reply to this post by Jacob Biesinger-2
Hi Jake,

> So I'm looking for feedback and possibly advice.  Ideally, we'd like to be able to run the entire pipeline, look at the results, possibly change a few parameters in some of the steps (e.g., minimum FDR cutoff) and rerun only what needs to be rerun.  Galaxy workflows are easy to create, but don't seem to have the flexibility that we're looking for.  Perhaps several workflows tied together would do the job (i.e., have separate workflows for the major parts of the analysis) which we could tie together (possible in galaxy?) into one uber-pipeline.

I'm a Galaxy developer who also uses Galaxy for NGS analyses; here are my opinions about workflows.

First, separate workflows for major or time-consuming aspects of an analysis work well. Galaxy provides the ability to copy (clone) workflows, and I often copy a workflow and then add to it so that I have the simpler workflow and also the more complex workflow. This enables me to run either the simpler or the complex workflow. Often, I run the complex analysis initially and use the simpler workflows to rerun particular aspects of the analysis. I've talked with others that do something similar.

What this means is that Galaxy needs to the ability to support embedded workflows. Making a change to the simple workflow currently requires manually propagating the changes to the more complex workflow, which is difficult and error-prone. Embedded/nested workflows are our on development list, but it's fairly far down the list right now because other issues are more pressing.

Finally, Galaxy enables you to specify parameters that must be set at runtime, so it's possible to easily rerun Galaxy workflows with different parameter values.

Best,
J.



_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev