Tom Han
09/03/2025, 11:35 PMprefect worker start --pool runai to register them to my self-hosted prefect server. But not sure if that is the best way because I also want to be able to scale the worker pool as demand grow/drop.Marvin
09/03/2025, 11:35 PMMarvin
09/03/2025, 11:43 PMprefect work-pool get-default-base-job-template --type kubernetes > base-job-template.json
Edit base-job-template.json to include:
- GPU resources (e.g., requests/limits like nvidia.com/gpu)
- RunAI-required annotations/labels (e.g., project/team, scheduler hints)
- Node selectors/tolerations/affinity
- ImagePullSecrets if needed
- Namespace and serviceAccount
Then create the pool:
prefect work-pool create runai --type kubernetes --base-job-template ./base-job-template.json
Useful CLI references:
- Manage work pools (Prefect 3 docs)
2) Run a worker inside the cluster
- Build a worker image with Prefect and Kubernetes extras:
Dockerfile
FROM python:3.11-slim
ENV PREFECT_API_URL=<https://your-prefect-server/api>
RUN pip install --no-cache-dir 'prefect[kubernetes]'
# If you use a service account in-cluster, you don’t need kubeconfig
CMD ["prefect", "worker", "start", "--pool", "runai", "--work-queue", "default"]
- Deploy it as a Deployment with appropriate RBAC to create Jobs/Pods:
RBAC (example)
apiVersion: v1
kind: ServiceAccount
metadata:
name: prefect-worker
namespace: your-namespace
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: Role
metadata:
name: prefect-worker
namespace: your-namespace
rules:
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create","get","list","watch","delete"]
- apiGroups: [""]
resources: ["pods","pods/log"]
verbs: ["get","list","watch","delete"]
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: RoleBinding
metadata:
name: prefect-worker
namespace: your-namespace
subjects:
- kind: ServiceAccount
name: prefect-worker
roleRef:
apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
kind: Role
name: prefect-worker
Worker Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: prefect-worker
namespace: your-namespace
spec:
replicas: 1
selector:
matchLabels: { app: prefect-worker }
template:
metadata:
labels: { app: prefect-worker }
spec:
serviceAccountName: prefect-worker
containers:
- name: worker
image: your-repo/prefect-worker:latest
env:
- name: PREFECT_API_URL
value: <https://your-prefect-server/api>
# If using Prefect API key:
# - name: PREFECT_API_KEY
# valueFrom: { secretKeyRef: { name: prefect, key: api-key } }
resources:
requests:
cpu: "100m"
memory: "128Mi"
Then scale this Deployment up/down to increase/decrease how many runs can be submitted concurrently.
3) Target the pool in your deploymentsMarvin
09/03/2025, 11:43 PMrunai pool and an appropriate queue (e.g., default, gpu, cpu). In Prefect 3.x, use prefect deploy from your project to associate your flows with this pool and queue. If you prefer code, you can also deploy with flow.from_source(...).deploy(...) and set pool_name and work_queue_name.
4) Control concurrency and autoscaling
- Use Prefect’s concurrency limits to cap total runs submitted:
- Set pool-wide limits:
prefect work-pool set-concurrency-limit "runai" 50
- Or set per-queue limits:
prefect work-queue set-concurrency-limit "gpu" 10
- Scale workers:
- Manually via Deployment replicas.
- With HPA (CPU/memory) if worker load is meaningful.
- With KEDA or a custom controller based on queue backlog (advanced).
- Flow-run pods scale naturally based on the pool’s job template and cluster capacity. Your RunAI/K8s cluster autoscaler can add/remove nodes as needed; the worker remains lightweight.
5) Optional: Prefect provisioning helper
- Prefect includes a helper to scaffold infrastructure for some pool types:
prefect work-pool provision-infrastructure "runai"
You’ll still need to ensure RBAC and the base job template reflect your RunAI requirements.
Notes and best practices
- One worker per pod is the sweet spot; don’t run a process pool within a single container.
- Install prefect[kubernetes] in the worker image so it can submit Kubernetes Jobs.
- Keep the worker’s namespace permissions scoped to where it will create flow-run Jobs.
- Use separate queues for different resource profiles (cpu vs gpu) and attach appropriate job template overrides in deployments if needed.
- Make sure flow-run images have everything your code needs (Python deps, CUDA/drivers as applicable). The worker image can be minimal.
Open questions to tailor this for you
- Do your flows require GPUs via RunAI? If yes, which annotations/labels/resources does RunAI require so we can sketch the exact base-job-template changes?
- Which namespace(s) will the worker and flow runs operate in?
- Do you want to autoscale workers based on queue backlog, or is static sizing + cluster autoscaling sufficient for now?
Helpful docs
- Manage work pools in Prefect 3: Manage work pools and workers
- Deploying with Prefect 3: Deployments in Prefect 3