parallelizing an NGS mapping workflow

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

parallelizing an NGS mapping workflow

Chris Berthiaume
Hello,

I'd like to use Galaxy on our local beowulf cluster for NGS workflows.  One typical use case we'd be replacing with Galaxy is a parallel BWA alignment of large fastq files.  To distribute this across the cluster we split the fastq file into many parts, run each separately against the same reference, and then use samtools to merge the SAM output.  It's not uncommon to end up with hundreds of parts after splitting.  How does Galaxy handle the parallelization of large NGS mappings?  I've found the tools for fastq QC, mapping, and SAM merging, but couldn't find any set of tools that would control the parallelization.  This trouble ticket (http://bitbucket.org/galaxy/galaxy-central/issue/197/starting-workflows-with-a-pool-of-input) would suggest this functionality hasn't been implemented yet, but it seems necessary for many (most?) Illumina or SOLiD runs to get a reasonable mapping turnaround time.  If this is already a feature it would be great if I could be pointed to the relev!
 ant docs and maybe it could be given a more prominent place in the wiki/interface.  If it's not yet a feature, is there a timeline for when it will be added?

Thanks,
Chris


--
Chris Berthiaume
Center for Environmental Genomics
University of Washington








_______________________________________________
galaxy-user mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-user
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing an NGS mapping workflow

Nate Coraor (nate@bx.psu.edu)
Chris Berthiaume wrote:
> Hello,
>
> I'd like to use Galaxy on our local beowulf cluster for NGS workflows.  One typical use case we'd be replacing with Galaxy is a parallel BWA alignment of large fastq files.  To distribute this across the cluster we split the fastq file into many parts, run each separately against the same reference, and then use samtools to merge the SAM output.  It's not uncommon to end up with hundreds of parts after splitting.  How does Galaxy handle the parallelization of large NGS mappings?  I've found the tools for fastq QC, mapping, and SAM merging, but couldn't find any set of tools that would control the parallelization.  This trouble ticket (http://bitbucket.org/galaxy/galaxy-central/issue/197/starting-workflows-with-a-pool-of-input) would suggest this functionality hasn't been implemented yet, but it seems necessary for many (most?) Illumina or SOLiD runs to get a reasonable mapping turnaround time.  If this is already a feature it would be great if I could be pointed to the rele
v!
>  ant docs and maybe it could be given a more prominent place in the wiki/interface.  If it's not yet a feature, is there a timeline for when it will be added?

Hi Chris,

This is a long standing feature request which has a ticket here:

http://bitbucket.org/galaxy/galaxy-central/issue/79

Unfortunately, still no timeline on when it'll be implemented, but it's
moving up on the list of priorities.

--nate

>
> Thanks,
> Chris
>
>

_______________________________________________
galaxy-user mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-user