intended behaviour for multiple data sets

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

intended behaviour for multiple data sets

Matthias Bernt
Dear list,

In addition to single datasets tool file inputs allow to choose between
- multiple dataset
- dataset collection
- some tools use repeat tags for multiple inputs

I would like to know what the intended behavior of galaxy is for the
three options considering multiple datasets? Is this described somewhere?

Is it that the tool is applied to each of the multiple inputs separately
or at once. For some tools one case makes more sense than the other.

Best,
Matthias

--

-------------------------------------------
Matthias Bernt
Bioinformatics Service
Molekulare Systembiologie (MOLSYB)
Helmholtz-Zentrum für Umweltforschung GmbH - UFZ/
Helmholtz Centre for Environmental Research GmbH - UFZ
Permoserstraße 15, 04318 Leipzig, Germany
Phone +49 341 235 482296,
[hidden email], www.ufz.de

Sitz der Gesellschaft/Registered Office: Leipzig
Registergericht/Registration Office: Amtsgericht Leipzig
Handelsregister Nr./Trade Register Nr.: B 4703
Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board:
MinDirig Wilfried Kraus
Wissenschaftlicher Geschäftsführer/Scientific Managing Director:
Prof. Dr. Dr. h.c. Georg Teutsch
Administrative Geschäftsführerin/ Administrative Managing Director:
Prof. Dr. Heike Graßmann
-------------------------------------------
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: intended behaviour for multiple data sets

Jennifer Hillman-Jackson


Hello,

How the data is consumed is explained on the tool form in the data entry area. Please review how to interpret this below and let us know if you are not sure about a particular tool. We might be able to explain how to access the help, or the tool might need an update to make the usage clearer.

Example1: Samtools Sort > click on the Multiple Dataset or Dataset Collection icons and this new help text will be presented: This is a batch mode input field. Separate jobs will be triggered for each dataset selection.

Example2: Samtools Mpileup > multiple dataset selection is the default (one or more can be chosen), or click to Collections where one or more can be selected. No new help text is presented to warn about the batch job mode. This means that inputs are processed together in the same job.

More complex entry can be found on tools like Compare two Datasets, where there are two input sections and each can have a single or multiple (batch) entry (individually selected or in a collection). The batch mode help text comes up when multiple/collections are selected. This is expanded behavior similar to Sort above.

And even more complex entry can be found on tools like MultiQC, where one or more input sets can be selected, and additional input sets (Reports) sections can be optionally added in. There is no batch mode entry text reported, meaning that all data is run with the same job. This is expanded behavior similar to Mpileup above, where each subsection is combined to produce a summary sub-report, then the final results from each sub-report is combined into the final report. 


Hope that helps!


Jen

--
Jennifer Hillman-Jackson
Galaxy Application Support

On Thu, Dec 14, 2017 at 3:34 AM, Matthias Bernt <[hidden email]> wrote:
Dear list,

In addition to single datasets tool file inputs allow to choose between
- multiple dataset
- dataset collection
- some tools use repeat tags for multiple inputs

I would like to know what the intended behavior of galaxy is for the three options considering multiple datasets? Is this described somewhere?

Is it that the tool is applied to each of the multiple inputs separately or at once. For some tools one case makes more sense than the other.

Best,
Matthias

--

-------------------------------------------
Matthias Bernt
Bioinformatics Service
Molekulare Systembiologie (MOLSYB)
Helmholtz-Zentrum für Umweltforschung GmbH - UFZ/
Helmholtz Centre for Environmental Research GmbH - UFZ
Permoserstraße 15, 04318 Leipzig, Germany
Phone <a href="tel:%2B49%20341%20235%20482296" value="+49341235482296" target="_blank">+49 341 235 482296,
[hidden email], www.ufz.de

Sitz der Gesellschaft/Registered Office: Leipzig
Registergericht/Registration Office: Amtsgericht Leipzig
Handelsregister Nr./Trade Register Nr.: B 4703
Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board: MinDirig Wilfried Kraus
Wissenschaftlicher Geschäftsführer/Scientific Managing Director:
Prof. Dr. Dr. h.c. Georg Teutsch
Administrative Geschäftsführerin/ Administrative Managing Director:
Prof. Dr. Heike Graßmann
-------------------------------------------
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: intended behaviour for multiple data sets

Matthias Bernt
Dear Jen,

thanks for the detailed info. I was asking more from a tool developers
perspective. But I found the answer: the multiple parameter of the input
tag -- I forgot about it.

Best,
Matthias

On 14.12.2017 21:40, Jennifer Hillman-Jackson wrote:

>
>
> Hello,
>
> How the data is consumed is explained on the tool form in the data entry
> area. Please review how to interpret this below and let us know if you
> are not sure about a particular tool. We might be able to explain how to
> access the help, or the tool might need an update to make the usage clearer.
>
> Example1: *Samtools Sort* > click on the Multiple Dataset or Dataset
> Collection icons and this new help text will be presented: This is a
> batch mode input field. Separate jobs will be triggered for each dataset
> selection.
>
> Example2: *Samtools Mpileup* > multiple dataset selection is the default
> (one or more can be chosen), or click to Collections where one or more
> can be selected. No new help text is presented to warn about the batch
> job mode. This means that inputs are processed together in the same job.
>
> More complex entry can be found on tools like *Compare two Datasets*,
> where there are two input sections and each can have a single or
> multiple (batch) entry (individually selected or in a collection). The
> batch mode help text comes up when multiple/collections are selected.
> This is expanded behavior similar to *Sort* above.
>
> And even more complex entry can be found on tools like *MultiQC*, where
> one or more input sets can be selected, and additional input sets
> (Reports) sections can be optionally added in. There is no batch mode
> entry text reported, meaning that all data is run with the same job.
> This is expanded behavior similar to *Mpileup* above, where each
> subsection is combined to produce a summary sub-report, then the final
> results from each sub-report is combined into the final report.
>
> *Galaxy tutorials: *https://galaxyproject.org/learn/
>
> Hope that helps!
>
>
> Jen
>
> --
> Jennifer Hillman-Jackson
> Galaxy Application Support
> http://usegalaxy.org
> http://galaxyproject.org
> http://biostar.usegalaxy.org
>
>
> On Thu, Dec 14, 2017 at 3:34 AM, Matthias Bernt <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Dear list,
>
>     In addition to single datasets tool file inputs allow to choose between
>     - multiple dataset
>     - dataset collection
>     - some tools use repeat tags for multiple inputs
>
>     I would like to know what the intended behavior of galaxy is for the
>     three options considering multiple datasets? Is this described
>     somewhere?
>
>     Is it that the tool is applied to each of the multiple inputs
>     separately or at once. For some tools one case makes more sense than
>     the other.
>
>     Best,
>     Matthias
>
>     --
>
>     -------------------------------------------
>     Matthias Bernt
>     Bioinformatics Service
>     Molekulare Systembiologie (MOLSYB)
>     Helmholtz-Zentrum für Umweltforschung GmbH - UFZ/
>     Helmholtz Centre for Environmental Research GmbH - UFZ
>     Permoserstraße 15, 04318 Leipzig, Germany
>     <https://maps.google.com/?q=Permoserstra%C3%9Fe+15,+04318+Leipzig,+Germany&entry=gmail&source=g>
>     Phone +49 341 235 482296 <tel:%2B49%20341%20235%20482296>,
>     [hidden email] <mailto:[hidden email]>, www.ufz.de <http://www.ufz.de>
>
>     Sitz der Gesellschaft/Registered Office: Leipzig
>     Registergericht/Registration Office: Amtsgericht Leipzig
>     Handelsregister Nr./Trade Register Nr.: B 4703
>     Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board:
>     MinDirig Wilfried Kraus
>     Wissenschaftlicher Geschäftsführer/Scientific Managing Director:
>     Prof. Dr. Dr. h.c. Georg Teutsch
>     Administrative Geschäftsführerin/ Administrative Managing Director:
>     Prof. Dr. Heike Graßmann
>     -------------------------------------------
>     ___________________________________________________________
>     Please keep all replies on the list by using "reply all"
>     in your mail client.  To manage your subscriptions to this
>     and other Galaxy lists, please use the interface at:
>     https://lists.galaxyproject.org/ <https://lists.galaxyproject.org/>
>
>     To search Galaxy mailing lists use the unified search at:
>     http://galaxyproject.org/search/ <http://galaxyproject.org/search/>
>
>

--

-------------------------------------------
Matthias Bernt
Bioinformatics Service
Molekulare Systembiologie (MOLSYB)
Helmholtz-Zentrum für Umweltforschung GmbH - UFZ/
Helmholtz Centre for Environmental Research GmbH - UFZ
Permoserstraße 15, 04318 Leipzig, Germany
Phone +49 341 235 482296,
[hidden email], www.ufz.de

Sitz der Gesellschaft/Registered Office: Leipzig
Registergericht/Registration Office: Amtsgericht Leipzig
Handelsregister Nr./Trade Register Nr.: B 4703
Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board:
MinDirig Wilfried Kraus
Wissenschaftlicher Geschäftsführer/Scientific Managing Director:
Prof. Dr. Dr. h.c. Georg Teutsch
Administrative Geschäftsführerin/ Administrative Managing Director:
Prof. Dr. Heike Graßmann
-------------------------------------------
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/