Hi, just joined this communit. Have started using Prefect in the past month. Have worked with different ETL tools in the past, and really think this is the best so far! My question for now: I am building an open data set for the Netherlands (also pertaining to mapping the spread of COVID-19) and have about 5 smaller flows that pulls data from public sources (bureau of statistics, database of addresses, reported cases etc.). What is the most idiomatic way of running these independent flows in parallel (prior to the actual modeling that needs to be done)? Just execute the data collection flows all at once on a DaskExecutor? Or is there a way to combine e.g. four flows into a fifth that is dependent on the four independent ones?
04/26/2020, 4:17 PM
Hey @Daniel, welcome! Today, we recommend running all four flows separately but simultaneously on an execution engine that supports parallelism - the DaskExecutor will do perfectly. You could either kick them off manually or schedule them all to start at the same time. We are working on introducing a more formal concept of “flow-to-flow dependencies” which would enable your second thought, but it doesn’t exist in Prefect today.
04/26/2020, 5:28 PM
Thanks for clarifying, @Jeremiah! Will stick to the first option for now.
04/27/2020, 9:42 AM
Would it be a sensible alternative to add a downstream task to a flow whose job is to schedule the next flow through the graphql api ?