Maryam Veisi

02/27/2023, 4:51 PM
I have 3 flows for a data workflow (flow1=extracts the data, flow2= makes data ready, flow 3= creates training set) For the next step of work, we want to run this workflow (3 flows) in parallel for different parameters +1000 times . Our concerns are: 1- memory 2- how to set up the pipeline to run +1000 flows in parallel + run the subflows sequentially Any thoughts?

Rob Freedy

02/27/2023, 10:15 PM
Hey @Maryam Veisi!! I would recommend checking out this discourse post (the syntax is a bit outdated but the pattern is still valid). The orchestrator worker patter will allow you to create subflows for your 3 processes and keep track of them within the parent flow. As for memory, Prefect has a KubernetesJob block that is helpful in defining specs for pods running Prefect flows. Depending on your infrastrucute, different blocks have different configurations for memory that could be useful for you use case. You can also use the ConcurrentTaskRunner or the DaskTaskRunner (see docs here) for either concurrently running tasks or parallelizing them.

Maryam Veisi

02/27/2023, 11:16 PM
@Rob Freedy Thanks.
👍 1