https://prefect.io logo
Title
a

Arnaud Legendre

11/14/2019, 1:29 PM
Hello everyone, I am new to prefect too, and my question is about running multiple flows in parallel. Let's say i have a sequence of tasks to achieve every day, upon the arrival of a new batch of data. I suppose i have to declare this sequence of tasks in a Flow object (with today's date as a variable) right ? Now let's say that i want to fast-forward this flow for several days in the past : What is the best option ? - I was thinking of creating a new flow, implementing a call to multiple tasks mapped to the set of dates i want to play, each of which running my 'daily' flow. Is it a valid strategy ? will the scheduler (supposedly based on dask.distributed) manage all the work and ressources properly ? - Is it better to rely on command-line interface of a main script, and run this script multiple times, as suggested here ((https://prefect-community.slack.com/archives/CL09KU1K7/p1567716836049700) ? Many thanks for the advice !
c

Chris White

11/14/2019, 3:11 PM
Hi Arnaud! - this pattern is discussed here: https://docs.prefect.io/cloud/faq.html#does-prefect-support-backfills In short, you’ll want to either rely on a Parameter or Prefect context for obtaining date information within your tasks, as this information is easily overriden on a per-run basis.
will the scheduler (supposedly based on dask.distributed) manage all the work and ressources properly ?
Just to be clear, Prefect does not use Dask unless you explicitly configure it to. Prefect Core has a local, in-process, synchronous scheduler that is independent of Dask and Prefect Cloud has a persistent, fully asynchronous scheduler which is also independent of dask.
a

Arnaud Legendre

11/14/2019, 3:45 PM
OK, thanks Chris ! That sounds pretty straightforward, i will try : "performing a backfill is as simple as looping over the desired values and creating individual flow runs for each value." 🙂
c

Chris White

11/14/2019, 3:46 PM
awesome - let us know how it goes!