I have 3 flows for a data workflow (flow1=extracts...
# prefect-cloud
m
I have 3 flows for a data workflow (flow1=extracts the data, flow2= makes data ready, flow 3= creates training set) For the next step of work, we want to run this workflow (3 flows) in parallel for different parameters +1000 times . Our concerns are: 1- memory 2- how to set up the pipeline to run +1000 flows in parallel + run the subflows sequentially Any thoughts?
1
r
Hey @Maryam Veisi!! I would recommend checking out this discourse post (the syntax is a bit outdated but the pattern is still valid). The orchestrator worker patter will allow you to create subflows for your 3 processes and keep track of them within the parent flow. As for memory, Prefect has a KubernetesJob block that is helpful in defining specs for pods running Prefect flows. Depending on your infrastrucute, different blocks have different configurations for memory that could be useful for you use case. You can also use the ConcurrentTaskRunner or the DaskTaskRunner (see docs here) for either concurrently running tasks or parallelizing them.
m
@Rob Freedy Thanks.
👍 1