Tom Han
09/03/2025, 11:35 PMprefect worker start --pool runai
to register them to my self-hosted prefect server. But not sure if that is the best way because I also want to be able to scale the worker pool as demand grow/drop.Marvin
09/03/2025, 11:35 PMMarvin
09/03/2025, 11:43 PMprefect work-pool get-default-base-job-template --type kubernetes > base-job-template.json
Edit base-job-template.json to include:
- GPU resources (e.g., requests/limits like nvidia.com/gpu)
- RunAI-required annotations/labels (e.g., project/team, scheduler hints)
- Node selectors/tolerations/affinity
- ImagePullSecrets if needed
- Namespace and serviceAccount
Then create the pool:
prefect work-pool create runai --type kubernetes --base-job-template ./base-job-template.json
Useful CLI references:
- Manage work pools (Prefect 3 docs)
2) Run a worker inside the cluster
- Build a worker image with Prefect and Kubernetes extras:
Dockerfile
FROM python:3.11-slim
ENV PREFECT_API_URL=<https://your-prefect-server/api>
RUN pip install --no-cache-dir 'prefect[kubernetes]'
# If you use a service account in-cluster, you don’t need kubeconfig
CMD ["prefect", "worker", "start", "--pool", "runai", "--work-queue", "default"]
- Deploy it as a Deployment with appropriate RBAC to create Jobs/Pods:
RBAC (example)
apiVersion: v1
kind: ServiceAccount
metadata:
name: prefect-worker
namespace: your-namespace
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: Role
metadata:
name: prefect-worker
namespace: your-namespace
rules:
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create","get","list","watch","delete"]
- apiGroups: [""]
resources: ["pods","pods/log"]
verbs: ["get","list","watch","delete"]
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: RoleBinding
metadata:
name: prefect-worker
namespace: your-namespace
subjects:
- kind: ServiceAccount
name: prefect-worker
roleRef:
apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
kind: Role
name: prefect-worker
Worker Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: prefect-worker
namespace: your-namespace
spec:
replicas: 1
selector:
matchLabels: { app: prefect-worker }
template:
metadata:
labels: { app: prefect-worker }
spec:
serviceAccountName: prefect-worker
containers:
- name: worker
image: your-repo/prefect-worker:latest
env:
- name: PREFECT_API_URL
value: <https://your-prefect-server/api>
# If using Prefect API key:
# - name: PREFECT_API_KEY
# valueFrom: { secretKeyRef: { name: prefect, key: api-key } }
resources:
requests:
cpu: "100m"
memory: "128Mi"
Then scale this Deployment up/down to increase/decrease how many runs can be submitted concurrently.
3) Target the pool in your deploymentsMarvin
09/03/2025, 11:43 PMrunai
pool and an appropriate queue (e.g., default
, gpu
, cpu
). In Prefect 3.x, use prefect deploy
from your project to associate your flows with this pool and queue. If you prefer code, you can also deploy with flow.from_source(...).deploy(...)
and set pool_name
and work_queue_name
.
4) Control concurrency and autoscaling
- Use Prefect’s concurrency limits to cap total runs submitted:
- Set pool-wide limits:
prefect work-pool set-concurrency-limit "runai" 50
- Or set per-queue limits:
prefect work-queue set-concurrency-limit "gpu" 10
- Scale workers:
- Manually via Deployment replicas.
- With HPA (CPU/memory) if worker load is meaningful.
- With KEDA or a custom controller based on queue backlog (advanced).
- Flow-run pods scale naturally based on the pool’s job template and cluster capacity. Your RunAI/K8s cluster autoscaler can add/remove nodes as needed; the worker remains lightweight.
5) Optional: Prefect provisioning helper
- Prefect includes a helper to scaffold infrastructure for some pool types:
prefect work-pool provision-infrastructure "runai"
You’ll still need to ensure RBAC and the base job template reflect your RunAI requirements.
Notes and best practices
- One worker per pod is the sweet spot; don’t run a process pool within a single container.
- Install prefect[kubernetes]
in the worker image so it can submit Kubernetes Jobs.
- Keep the worker’s namespace permissions scoped to where it will create flow-run Jobs.
- Use separate queues for different resource profiles (cpu vs gpu) and attach appropriate job template overrides in deployments if needed.
- Make sure flow-run images have everything your code needs (Python deps, CUDA/drivers as applicable). The worker image can be minimal.
Open questions to tailor this for you
- Do your flows require GPUs via RunAI? If yes, which annotations/labels/resources does RunAI require so we can sketch the exact base-job-template changes?
- Which namespace(s) will the worker and flow runs operate in?
- Do you want to autoscale workers based on queue backlog, or is static sizing + cluster autoscaling sufficient for now?
Helpful docs
- Manage work pools in Prefect 3: Manage work pools and workers
- Deploying with Prefect 3: Deployments in Prefect 3