Using API to identify all datasets that were part of a workflow?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Using API to identify all datasets that were part of a workflow?

Ben Bimber
Hello,

I'm still relatively new to galaxy.  I'm trying to use the API to identify the string of jobs/datasets that were created as part of executing a workflow.  So far as I can tell, the API gives me the ID of the job, which corresponds to one step in the workflow.  Each of these has inputs/outputs.  I can walk outwards and try to connect any other jobs that happen to use one of these files as an input or output; however, I am not seeing any key that provides a more direct indication that a set of steps was executed as part of a given workflow.  Am I missing something?

Thanks in advance,
Ben

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Using API to identify all datasets that were part of a workflow?

John Chilton-4
Can you clarify one thing for me - are you attempting to break a
workflow invocation into steps, and then jobs, and then inputs and
outputs (so working from the workflow invocation) or are you trying to
scan existing histories and find a workflow for each dataset (so
working from the history id and workflow id maybe)?

I feel like this should be doable now - though blend4j and to a lesser
extent even bioblend are pretty far behind what I would consider best
practices for invoking workflows via the API so they may need to be
updated.

-John


On Thu, Jun 25, 2015 at 10:04 AM, Ben Bimber <[hidden email]> wrote:

> Hello,
>
> I'm still relatively new to galaxy.  I'm trying to use the API to identify
> the string of jobs/datasets that were created as part of executing a
> workflow.  So far as I can tell, the API gives me the ID of the job, which
> corresponds to one step in the workflow.  Each of these has inputs/outputs.
> I can walk outwards and try to connect any other jobs that happen to use one
> of these files as an input or output; however, I am not seeing any key that
> provides a more direct indication that a set of steps was executed as part
> of a given workflow.  Am I missing something?
>
> Thanks in advance,
> Ben
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Using API to identify all datasets that were part of a workflow?

Ben Bimber
the latter.  starting with a dataset, pull it's full history.  therefore if it was created by running a simple single-step tool it's one step.  if it was created as part of a workflow, grab that whole series of steps/inputs/outputs.  

i agree on the python/java bindings being out of date, but even when i was scanning the JSON I wasnt able to see where I'd glean this information.  the missing thing for me was always determining if a given dataset was connected to a larger workflow.

-ben

On Thu, Jun 25, 2015 at 7:26 AM, John Chilton <[hidden email]> wrote:
Can you clarify one thing for me - are you attempting to break a
workflow invocation into steps, and then jobs, and then inputs and
outputs (so working from the workflow invocation) or are you trying to
scan existing histories and find a workflow for each dataset (so
working from the history id and workflow id maybe)?

I feel like this should be doable now - though blend4j and to a lesser
extent even bioblend are pretty far behind what I would consider best
practices for invoking workflows via the API so they may need to be
updated.

-John


On Thu, Jun 25, 2015 at 10:04 AM, Ben Bimber <[hidden email]> wrote:
> Hello,
>
> I'm still relatively new to galaxy.  I'm trying to use the API to identify
> the string of jobs/datasets that were created as part of executing a
> workflow.  So far as I can tell, the API gives me the ID of the job, which
> corresponds to one step in the workflow.  Each of these has inputs/outputs.
> I can walk outwards and try to connect any other jobs that happen to use one
> of these files as an input or output; however, I am not seeing any key that
> provides a more direct indication that a set of steps was executed as part
> of a given workflow.  Am I missing something?
>
> Thanks in advance,
> Ben
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/