https://prefect.io logo
Title
m

Maciej

02/03/2023, 9:02 PM
Hi! I am just coming over from Prefect 1 to 2, and also thinking about how to set things up to run more efficiently. I'm having a hard time wrapping my head around some of the new concepts, hoping someone can point me in the right direction. I have a flow where some tasks require very little resources but can take a long time (e.g. trigger execution of a long-running procedure in a database), and others are very resource-heavy (data processing done in task's Python code). They would clearly benefit from running on different infrastructure. But as I understand it, each flow must run on one infrastructure? What is the right way to approach this kind of problem from a Prefect perspective?
t

Timo Vink

02/03/2023, 9:21 PM
I'm currently working through the same problem. What I have so far, in case it helps: I'm building a separate Docker image for each
@flow
which includes code+dependencies, and creating a separate
Deployment
for each, using a
KubernetesJob
for the infrastructure. This way each flow has isolated code dependencies and can set its own resource requirements. You can then use
run_deployment
(docs) from one flow to start another flow as a subflow using that infrastructure configuration.
Which works... but I'm still trying to make the experience nicer (e.g., have it run the subflow in-process during local development, but in a separate
Pod
when deployed to k8s), making sure logs are showing in a centralized place, etc.
m

Maciej

02/03/2023, 11:05 PM
Thanks @Timo Vink! That sounds like a good solution. The only concern I have then, is that the "parent" flow does have to also be running throughout the whole process, yes? Ie., when it spawns the subflow with run_deployment, it is waiting for a successful result and then calling the next task? Or, are you actually chaining it in some way where the first flow completes after a successful handoff but before the second flow is complete?
t

Timo Vink

02/03/2023, 11:35 PM
I think you can have either behaviour, using the
timeout
parameter. By default the parent flow will wait for the subflow, but if you set
timeout=0
it returns immediately after kicking off the subflow run.
m

Maciej

02/04/2023, 12:48 AM
Ah, awesome. Sorry, I glazed over that thinking timeout would automatically interpret as a failure. Thanks! Yeah, I think this could work!