Hi Given an unrun flow what s the best way to get the inputs Prefect Community #ask-community

Hi! Given an unrun flow, what's the best way to ge...

Jeff Yun

10/15/2019, 1:46 PM

Hi! Given an unrun flow, what's the best way to get the inputs passed into the tasks (like the output of

task_runner.get_task_inputs(state, upstream_states)

but without having to run the flow)?

Chris White

10/15/2019, 1:47 PM

Hi Jeff! These inputs are necessarily unavailable, as they are the outputs of the upstream tasks after they have run

Jeff Yun

10/15/2019, 1:48 PM

What do Task objects know about their interactions with other tasks, or is this all handled by the Flow context (since it manages the Edges and Task states)?

Chris White

10/15/2019, 1:50 PM

yea you hit the nail on the head - > Task objects are standalone pieces of configurable logic. They typically can be: - configured through

__init__

(initialization) - run through

run()

, which can optionally receive inputs which always come from upstream tasks and the Flow is responsible for tracking all dependency relationships

👍 1

Jeff Yun

10/15/2019, 2:05 PM

Could I easily get the input objects from the Flow (which are there but just unresolved until the upstream tasks are run)? i.e. If I wanted to run terminal tasks (and

await

their dependencies) with my own TaskRunner, instead of the current run-root-tasks-first approach

Jeff Yun

10/15/2019, 2:23 PM

When triggering Task A to run, does roughly the following happen? 1. The flow follows the edges to find the subset of tasks it needs for Task A. 2. It has TaskRunner run the tasks, starting from the root tasks, extracting results from their final/finished states and passing along downstream edges. The root tasks are just Parameters, their inputs are accessible ahead of time in

prefect.context["parameters"]

3. You can't easily access non-Param task inputs passed in

run

, because they come from the /final/ states of upstream tasks (and you can't just "wait" for these inputs, because you don't know which object they're from; the final upstream state object hasn't even been created)

Chris White

10/15/2019, 2:30 PM

Ah ok so if you’re interested in testing how your downstream tasks respond to inputs, I’d recommend using the

task_states

keyword argument to either

flow.run

FlowRunner.run

, which is a dictionary of task objects -> prefect state objects. This would allow you to “mock” upstream success with arbitrary results

👍 1

Jeff Yun

10/15/2019, 2:31 PM

Could I do something like: store A's upstream tasks through

A.__init__()

self.dependencies

, and access their results in

run()

or another method?

Chris White

10/15/2019, 2:46 PM

hmmm we usually don’t recommend storing any state on the task objects, as that state can be corrupted and / or lost in certain execution environments

Jeff Yun

10/15/2019, 4:04 PM

How would I implement Task.get_direct_dependencies() here?

Untitled

Jeff Yun

10/15/2019, 4:08 PM

In this case, each Task.run() returns a list of Futures (as a workaround of

map

making 100k tasks each doing the same 1 thing, we can instead have 1 task do 100k things, i.e. submit 100k requests to an external slurm-like scheduler we use, which returns us 100k futures).

Chris White

10/15/2019, 4:15 PM

I don’t think you could implement that on the Task object itself, you’d probably need to implement it on the Flow object and add a “task” argument to it; Tasks themselves have no knowledge / hooks of their dependencies, they just run with the inputs they’re provided when they’re told to run by a runner

Chris White

10/15/2019, 4:15 PM

alternatively

Chris White

10/15/2019, 4:15 PM

you could implement your own

TaskRunner

class and use it for execution -> this might actually be the best way to go

Chris White

10/15/2019, 4:16 PM

and this method would just be a new step in the task runner pipeline - you could override one of our internal steps and implement custom logic

3 Views

Open in Slack

Previous Next