i had a conceptual question about the deployment r...
# prefect-community
d
i had a conceptual question about the deployment running infra. when we run same deployment multiple times (with same or different params) does each of them create new pods to run new job or are these pods created specific to flows ? and if i want my parallel processing(parallel running tasks or flows) to be distributed into multiple jobs so as to make sure the amount of resources each process is taking is not huge.. how can i achieve that ? is creating separate deployments a way to do it ?
1
o
Unless I'm getting the terminology mixed up, the pods should be specific to the flow runs, not flows or deployments.
gratitude thank you 2
d
im using cli command prefect deployment run so that is generating a flow run (while taking flow info from s3 and infra info from kubernetes block) which creates a pod to run the kubernetes job
o
Yeah, exactly. And if you execute deployment run multiple times, it should create multiple pods.
👍 1
d
so in that deployment run (which triggers a single flow run) if i have subflows or parallel running task , is there a way to distribute them horizontally ? or it will just increase the cpu and memory of the job pod ?
o
ah, I believe you would have to use something like ray or dask for that, or you can use the run_deployment function to run decoupled subflows in their own infrastructure
💯 2
it gets a bit tricky with Python but it seems to be a common use case
d
im not very familiar with dask or ray. but do they work on kubernetes jobs ? if i install prefect-dask in my image ?
o
They're kinda complicated to set up, we've been spending the last week doing it.. It's like an additional layer of complexity on top of Kubernetes or whatever else you use to execute jobs.
d
hmmm okay. any specific online resources you are following for it ? because right now im just using the default concurrent task runner. which i think vertically scales up the job
a
For IO operations, concurrent task runner is the best option, but if you need to do e.g. some data transformations directly in Python, then moving to Dask or Ray can be helpful
d
yes im performing transformations and read write ops to our db as well
a
everything that Oscar said 💯
yeah Dask and mapping can help then
d
any particular resources for it ? or any example of implementation ?
a
https://medium.com/slateco-blog/prefect-x-kubernetes-x-ephemeral-dask-power-without-responsibility-6e10b4f2fe40 this is for v1 but Dask setup should be valid (may need some small modifications)
o
We've been using a bunch of different resources.. I think your exact needs will vary depending on if you're using kubernetes, which cloud you're hosting it on, etc. There's probably no one size fits all-type solution.
👍 2
+ what Anna linked. 🙂
d
thank you so much. ill try it out.