https://prefect.io logo
Title
d

Deepanshu Aggarwal

10/28/2022, 11:48 AM
i had a conceptual question about the deployment running infra. when we run same deployment multiple times (with same or different params) does each of them create new pods to run new job or are these pods created specific to flows ? and if i want my parallel processing(parallel running tasks or flows) to be distributed into multiple jobs so as to make sure the amount of resources each process is taking is not huge.. how can i achieve that ? is creating separate deployments a way to do it ?
1
o

Oscar Björhn

10/28/2022, 11:50 AM
Unless I'm getting the terminology mixed up, the pods should be specific to the flow runs, not flows or deployments.
:gratitude-thank-you: 2
d

Deepanshu Aggarwal

10/28/2022, 11:51 AM
im using cli command prefect deployment run so that is generating a flow run (while taking flow info from s3 and infra info from kubernetes block) which creates a pod to run the kubernetes job
o

Oscar Björhn

10/28/2022, 11:52 AM
Yeah, exactly. And if you execute deployment run multiple times, it should create multiple pods.
👍 1
d

Deepanshu Aggarwal

10/28/2022, 11:53 AM
so in that deployment run (which triggers a single flow run) if i have subflows or parallel running task , is there a way to distribute them horizontally ? or it will just increase the cpu and memory of the job pod ?
o

Oscar Björhn

10/28/2022, 11:57 AM
ah, I believe you would have to use something like ray or dask for that, or you can use the run_deployment function to run decoupled subflows in their own infrastructure
💯 2
it gets a bit tricky with Python but it seems to be a common use case
d

Deepanshu Aggarwal

10/28/2022, 11:58 AM
im not very familiar with dask or ray. but do they work on kubernetes jobs ? if i install prefect-dask in my image ?
o

Oscar Björhn

10/28/2022, 12:06 PM
They're kinda complicated to set up, we've been spending the last week doing it.. It's like an additional layer of complexity on top of Kubernetes or whatever else you use to execute jobs.
d

Deepanshu Aggarwal

10/28/2022, 12:08 PM
hmmm okay. any specific online resources you are following for it ? because right now im just using the default concurrent task runner. which i think vertically scales up the job
a

Anna Geller

10/28/2022, 12:12 PM
For IO operations, concurrent task runner is the best option, but if you need to do e.g. some data transformations directly in Python, then moving to Dask or Ray can be helpful
d

Deepanshu Aggarwal

10/28/2022, 12:12 PM
yes im performing transformations and read write ops to our db as well
a

Anna Geller

10/28/2022, 12:12 PM
everything that Oscar said 💯
yeah Dask and mapping can help then
d

Deepanshu Aggarwal

10/28/2022, 12:13 PM
any particular resources for it ? or any example of implementation ?
a

Anna Geller

10/28/2022, 12:13 PM
https://medium.com/slateco-blog/prefect-x-kubernetes-x-ephemeral-dask-power-without-responsibility-6e10b4f2fe40 this is for v1 but Dask setup should be valid (may need some small modifications)
o

Oscar Björhn

10/28/2022, 12:14 PM
We've been using a bunch of different resources.. I think your exact needs will vary depending on if you're using kubernetes, which cloud you're hosting it on, etc. There's probably no one size fits all-type solution.
👍 2
+ what Anna linked. 🙂
d

Deepanshu Aggarwal

10/28/2022, 12:15 PM
thank you so much. ill try it out.