Arthur
09/18/2025, 7:29 AMMarvin
09/18/2025, 7:29 AMMarvin
09/18/2025, 7:44 AMimagePullPolicy: IfNotPresent
.
- Example DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: image-prepuller
namespace: kube-system
spec:
selector:
matchLabels:
app: image-prepuller
template:
metadata:
labels:
app: image-prepuller
spec:
containers:
- name: keep-image
image: your-registry/your-image:tag
imagePullPolicy: IfNotPresent
command: ["sh", "-c", "sleep 86400"]
tolerations:
- operator: "Exists"
- Also consider pre-pulling in node startup scripts (e.g., containerd ctr images pull
), or using an image cache.
- Keep a small pool of nodes warm
- Set a minimum size on your node group (or Karpenter/cluster autoscaler settings) so some capacity is always available. This avoids node scale-up lag.
- Use a long‑lived “workers pull from a queue” pattern
- Instead of Jobs, run a Deployment of N worker pods that fetch work from a queue and execute tasks. This gives truly warm containers at the cost of per-task isolation.
- If you’re using Prefect, you can run workers as a Deployment inside the cluster (the “process” model) for minimal startup latency, or use the Kubernetes worker (Job-per-run) and combine it with pre-pulled images to speed up starts.
- Prefect 3.x Kubernetes worker docs: Kubernetes work pools and workers
- Keep a compute backend warm (Dask/Ray)
- For distributed workloads, keep a Dask or Ray cluster running in K8s and point your Prefect task runner at it. Your flow still starts a pod, but individual tasks schedule quickly onto already-warm workers.
- Serverless-style warm pods (for request/response apps)
- Knative Serving can maintain a minimum number of pods with minScale
to avoid cold starts. This is great for services, less so for batch Jobs.
- Smaller, cached images and no build-on-start
- Bake all dependencies into your image and avoid doing pip install
at container start.
- Reuse common layers so the node cache helps across versions.
How this maps to Prefect 3.x specifically:
- Prefect’s Kubernetes worker creates a Kubernetes Job per flow run; there isn’t a built-in “warm pod pool”. To reduce latency:
- Pre-pull your flow image (DaemonSet or node bootstrap).
- Set imagePullPolicy: IfNotPresent
and imagePullSecrets
in your work pool’s base job template.
- Schedule to a node group that stays warm (nodeSelector/affinity).
- If you need sub-second start times and don’t require K8s isolation per run, consider running a Prefect process worker as a Deployment inside the cluster so flows run as subprocesses in already-warm pods.
If you can share whether you’re asking in general or specifically about Prefect flow runs, I can tailor a concrete config (e.g., exact base job template fields for a Prefect Kubernetes work pool).