Hi, we're looking into ways to reduce latency betw...
# prefect-community
a
Hi, we're looking into ways to reduce latency between Flow Run submission and getting results back. At the moment we're using KubernetesAgent which spawns k8s Jobs for each flow run. Job initialization is quite slow in our case due to image size. Question: if we setup permanent Dask cluster with appropriate image and set executor to DaskExecutor - would we be able to skip Job initialization step? i.e. is it Agent who sends specific commands to Dask cluster or k8s Job cannot be avoided?
k
I think there is a difference right? You are combining Flow job setup and Executor job setup into one. I thought the default of KubernetesRun was to pull an image
IfNotExists
which has caused some members in the community not pulling updated images because they already had one with the same name. Yes you can skip executor initialization. Agent spins up the FlowRunner (flow setup) and then the Flow connects to the cluster and sends work. I think you for either, you should be able to bring latency down by caching Kubernetes images on the cluster? I donโ€™t know a lot about it but I know it can be done
a
Image caching - yes, but it's not exactly what we're looking for. We have quite a volume of Prefect flow runs, so shaving a bit from each Job initialization is a significant thing for us.
If new Job is inevitable in KubernetesAgent, we will run "LocalAgent" in k8s pod and point it into predefined static Dask cluster ๐Ÿ™‚
k
Wow ok yeah local agent would be faster than a new pod, as long as it can pull the flow from storage
a
Good point, there should be something tricky about the labels, right?
k
Default labels are associated with storage. You can just turn off the default labels of local storage though like this
a
exactly what we need, thanks ๐Ÿ˜‰
๐Ÿ‘ 1
m
Do you know which specific part of flow initialization is slow? Is it pod startup or something else?
a
@Matthias mostly resource allocation, pod initialization. All the k8s stuff
m
That might be mostly related to container images being too large. There are several ways to slim them down.
What could also help (but that depends) is to set resource requests/limits for each job so that the scheduler can make better informed decisions on where to place pods. And lastly, creating dedicated worker groups (or node pools) and adding nodeselectors to your job manifests can also help to better spread the load.
a
nothing that contains pytorch gpu can be slimmed down :)))
m
What you could do is to work with a flow of flow where you only use the pytorch container (in a subflow) when it is strictly necessary.