<@ULVA73B9P> I am thinking about creating a worker...
# ask-marvin
t
@Marvin I am thinking about creating a worker pool hosted on our RunAI platform. Which is essentially a wrapper for k8s. What would you recommend to be the scheme? I am thinking about just starting a process pool, and in my dockerfile use
prefect worker start --pool runai
to register them to my self-hosted prefect server. But not sure if that is the best way because I also want to be able to scale the worker pool as demand grow/drop.
m
thought for 472.8 seconds
Great question — since RunAI is a thin wrapper over Kubernetes, the cleanest and most scalable pattern in Prefect 3.x is to use a Kubernetes work pool and run one Prefect worker per pod (not a process pool) — then scale the worker Deployment horizontally as needed. Let Kubernetes/RunAI handle the actual flow-run pods via your base job template. Recommendation summary - Use a Kubernetes work pool with a base job template customized for RunAI (GPU requests, node selectors/tolerations, required annotations/labels). - Run a lightweight Prefect worker as a long‑lived Deployment in your RunAI/K8s cluster. One worker per pod; scale replicas to increase scheduling throughput. - Concurrency is managed at the work pool/work queue level; flow runs execute as Kubernetes Jobs/Pods according to your template and can scale independently of workers. Why not a process pool in one container? - Workers are stateless and light — they just poll the server and submit K8s Jobs. Running multiple processes in one pod complicates lifecycle management and doesn’t improve cluster scheduling. Horizontal scaling with multiple worker pods is simpler and more resilient. Step-by-step 1) Create a Kubernetes work pool and base template - Start from Prefect’s default Kubernetes job template and customize it for RunAI:
Copy code
prefect work-pool get-default-base-job-template --type kubernetes > base-job-template.json
Edit base-job-template.json to include: - GPU resources (e.g., requests/limits like nvidia.com/gpu) - RunAI-required annotations/labels (e.g., project/team, scheduler hints) - Node selectors/tolerations/affinity - ImagePullSecrets if needed - Namespace and serviceAccount Then create the pool:
Copy code
prefect work-pool create runai --type kubernetes --base-job-template ./base-job-template.json
Useful CLI references: - Manage work pools (Prefect 3 docs) 2) Run a worker inside the cluster - Build a worker image with Prefect and Kubernetes extras: Dockerfile
Copy code
FROM python:3.11-slim
ENV PREFECT_API_URL=<https://your-prefect-server/api>
RUN pip install --no-cache-dir 'prefect[kubernetes]'
# If you use a service account in-cluster, you don’t need kubeconfig
CMD ["prefect", "worker", "start", "--pool", "runai", "--work-queue", "default"]
- Deploy it as a Deployment with appropriate RBAC to create Jobs/Pods: RBAC (example)
Copy code
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prefect-worker
  namespace: your-namespace
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: Role
metadata:
  name: prefect-worker
  namespace: your-namespace
rules:
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["create","get","list","watch","delete"]
- apiGroups: [""]
  resources: ["pods","pods/log"]
  verbs: ["get","list","watch","delete"]
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: RoleBinding
metadata:
  name: prefect-worker
  namespace: your-namespace
subjects:
- kind: ServiceAccount
  name: prefect-worker
roleRef:
  apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
  kind: Role
  name: prefect-worker
Worker Deployment
Copy code
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prefect-worker
  namespace: your-namespace
spec:
  replicas: 1
  selector:
    matchLabels: { app: prefect-worker }
  template:
    metadata:
      labels: { app: prefect-worker }
    spec:
      serviceAccountName: prefect-worker
      containers:
      - name: worker
        image: your-repo/prefect-worker:latest
        env:
        - name: PREFECT_API_URL
          value: <https://your-prefect-server/api>
        # If using Prefect API key:
        # - name: PREFECT_API_KEY
        #   valueFrom: { secretKeyRef: { name: prefect, key: api-key } }
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
Then scale this Deployment up/down to increase/decrease how many runs can be submitted concurrently. 3) Target the pool in your deployments
- When creating deployments, target the
runai
pool and an appropriate queue (e.g.,
default
,
gpu
,
cpu
). In Prefect 3.x, use
prefect deploy
from your project to associate your flows with this pool and queue. If you prefer code, you can also deploy with
flow.from_source(...).deploy(...)
and set
pool_name
and
work_queue_name
. 4) Control concurrency and autoscaling - Use Prefect’s concurrency limits to cap total runs submitted: - Set pool-wide limits:
Copy code
prefect work-pool set-concurrency-limit "runai" 50
- Or set per-queue limits:
Copy code
prefect work-queue set-concurrency-limit "gpu" 10
- Scale workers: - Manually via Deployment replicas. - With HPA (CPU/memory) if worker load is meaningful. - With KEDA or a custom controller based on queue backlog (advanced). - Flow-run pods scale naturally based on the pool’s job template and cluster capacity. Your RunAI/K8s cluster autoscaler can add/remove nodes as needed; the worker remains lightweight. 5) Optional: Prefect provisioning helper - Prefect includes a helper to scaffold infrastructure for some pool types:
Copy code
prefect work-pool provision-infrastructure "runai"
You’ll still need to ensure RBAC and the base job template reflect your RunAI requirements. Notes and best practices - One worker per pod is the sweet spot; don’t run a process pool within a single container. - Install
prefect[kubernetes]
in the worker image so it can submit Kubernetes Jobs. - Keep the worker’s namespace permissions scoped to where it will create flow-run Jobs. - Use separate queues for different resource profiles (cpu vs gpu) and attach appropriate job template overrides in deployments if needed. - Make sure flow-run images have everything your code needs (Python deps, CUDA/drivers as applicable). The worker image can be minimal. Open questions to tailor this for you - Do your flows require GPUs via RunAI? If yes, which annotations/labels/resources does RunAI require so we can sketch the exact base-job-template changes? - Which namespace(s) will the worker and flow runs operate in? - Do you want to autoscale workers based on queue backlog, or is static sizing + cluster autoscaling sufficient for now? Helpful docs - Manage work pools in Prefect 3: Manage work pools and workers - Deploying with Prefect 3: Deployments in Prefect 3