Hi all, I'm trying to have dependent flows, where the parent flow that needs to run only once.
Consider the following scenario:
1. Parent flow is an expansive process that preprocesses data and loads into a storage (let's say S3). Ideally, it shouldn't run more than once.
1. There can be multiple children flows that use the preprocessed data.
I've taken a look at
this. However, with this setup, in order to ensure for the parent flow to run only once, I need to put all children flows in a single "flow of flows". However, I envision a use case where different teams bring in their own flow and use the common preprocessed data. My understanding is that if each team's flow define the parent flow as dependency using
upstream_tasks
, it would cause the parent flow to run multiple times.
I think what I need is like the factory pattern in OO paradigm. In Airflow, I think I can use
ExternalTaskSensor to achieve this.
- Does my question/use case make sense?
- Is there a preferred "Prefect" way to solve this?
- Or should I keep the state of the parent job externally?