https://prefect.io logo
Title
c

Chris Hart

07/08/2019, 7:02 PM
but in the task 2, I need to do post-processing on the selfsame data structure as outputted by task 1.. in airflow it would seem this is a no-no without persisting it somewhere first.. I guess just wondering: is there is a general preference for mixing purposes and keeping tasks orthogonal?
j

Jeremiah

07/08/2019, 7:19 PM
Hey @Chris Hart! What you’re describing is called “dataflow” and it’s a first-class operation in Prefect. Your preprocessing task 1 can directly pass its out output to your downstream task 2.
There are more advanced cases to consider, such as configuring a
result_handler
to automatically serialize the passed data in the event of a task failure or retry (so it can be retrieved at a later date without running the preprocessing task again, for example). If that’s interesting, feel free to shoot us a note and we’ll help you get set up.
c

Chris Hart

07/08/2019, 7:20 PM
ok sweet thanks! I'm actually doing that already but had a moment of self doubt about them operating on the same thing because "idempotent all the things" or whatever.. thanks for clarifying that it's encouraged
👍 1
j

Jeremiah

07/08/2019, 7:23 PM
Awesome — yeah, “idempotent all the things” is definitely good theory in general, but it’s usually really hard in practice. Prefect doesn’t require idempotency.