Hi just joined this communit Have started using Prefect in t Prefect Community #ask-community

Hi, just joined this communit. Have started using ...

Daniel

04/26/2020, 9:47 AM

Hi, just joined this communit. Have started using Prefect in the past month. Have worked with different ETL tools in the past, and really think this is the best so far! My question for now: I am building an open data set for the Netherlands (also pertaining to mapping the spread of COVID-19) and have about 5 smaller flows that pulls data from public sources (bureau of statistics, database of addresses, reported cases etc.). What is the most idiomatic way of running these independent flows in parallel (prior to the actual modeling that needs to be done)? Just execute the data collection flows all at once on a DaskExecutor? Or is there a way to combine e.g. four flows into a fifth that is dependent on the four independent ones?

upvote 1

😄 1

👋 2

Jeremiah

04/26/2020, 4:17 PM

Hey @Daniel, welcome! Today, we recommend running all four flows separately but simultaneously on an execution engine that supports parallelism - the DaskExecutor will do perfectly. You could either kick them off manually or schedule them all to start at the same time. We are working on introducing a more formal concept of “flow-to-flow dependencies” which would enable your second thought, but it doesn’t exist in Prefect today.

Daniel

04/26/2020, 5:28 PM

Thanks for clarifying, @Jeremiah! Will stick to the first option for now.

David Ojeda

04/27/2020, 9:42 AM

Would it be a sensible alternative to add a downstream task to a flow whose job is to schedule the next flow through the graphql api ?

Jeremiah

04/27/2020, 1:54 PM

Yup! We actually have a task for that in the task library - https://docs.prefect.io/api/latest/tasks/cloud.html#flowruntask (note in 0.11 this will change from

tasks.cloud.flowruntask

tasks.prefect.flowruntask

)

Jeremiah

04/27/2020, 1:55 PM

Using that task is a good way to get “fan-out” dependencies where one flow kicks off others independently, but we’re working on better “fan-in” semantics (where a flow depends on multiple upstream)

David Ojeda

04/27/2020, 1:56 PM

That’s great

2 Views

Open in Slack

Previous Next