Hi, just joined this communit. Have started using Prefect in the past month. Have worked with different ETL tools in the past, and really think this is the best so far! My question for now: I am building an open data set for the Netherlands (also pertaining to mapping the spread of COVID-19) and have about 5 smaller flows that pulls data from public sources (bureau of statistics, database of addresses, reported cases etc.). What is the most idiomatic way of running these independent flows in parallel (prior to the actual modeling that needs to be done)? Just execute the data collection flows all at once on a DaskExecutor? Or is there a way to combine e.g. four flows into a fifth that is dependent on the four independent ones?
:upvote: 1
đ 1
đ 2
j
Jeremiah
04/26/2020, 4:17 PM
Hey @Daniel, welcome! Today, we recommend running all four flows separately but simultaneously on an execution engine that supports parallelism - the DaskExecutor will do perfectly. You could either kick them off manually or schedule them all to start at the same time. We are working on introducing a more formal concept of âflow-to-flow dependenciesâ which would enable your second thought, but it doesnât exist in Prefect today.
d
Daniel
04/26/2020, 5:28 PM
Thanks for clarifying, @Jeremiah! Will stick to the first option for now.
d
David Ojeda
04/27/2020, 9:42 AM
Would it be a sensible alternative to add a downstream task to a flow whose job is to schedule the next flow through the graphql api ?
Using that task is a good way to get âfan-outâ dependencies where one flow kicks off others independently, but weâre working on better âfan-inâ semantics (where a flow depends on multiple upstream)