What is the best practise for splitting a large workflow with many upstream and downstream datasets into smaller flows?
We are starting to set up our first couple of flows to manage the pipelines for our data warehouse. Like in most data infrastructures we will probably have several upstream datasets, which will feed into the pipelines for datasets further downstream, which in turn feed into more datasets even further downstream and so on. My question is how to best organise a large workflow like this? You could create one massive flow, but this wouldn’t be very nice to manage. So what is the best way to split this up in prefect?
Some ideas:
• Import a task from the upstream flow into the downstream flow.
I don’t think this has the intended effect. You end up re-registering the upstream flow every time you do the import, and I think it ends up just duplicating the imported task, rather than making the downstream flow actually wait for the upstream flow.
• Create a waiter task that succeeds once the upstream data has landed.
• Send some kind of event from the upstream flow that triggers the downstream flow. (Not completely sure how I’d do that)
• Create a Parent-flow that triggers a bunch of sub-flows
Is there a “correct” way to do this? What are people’s experiences? Sorry if I missed an obvious piece of documentation or discussion somewhere.