<@ULVA73B9P> I'm trying to figure out how to manag...
# ask-marvin
m
@Marvin I'm trying to figure out how to manage the lifecycle of a service running a worker using the process or docker type (ie, flow execution occurs on the same machine as the worker). When I need to restart the server (system updates, new docker image, if I wanted to do any sort of scaling, etc), how can I ensure there are no long-running flows I would be interrupting? Or to put it another way, I can spin up a new copy of the service to do a blue/green deployment, but how can I prevent the old copy from taking on new jobs and only complete existing jobs, after the new one is ready?
m
Error: No module named 'prefect.workers.docker'
thought for 735.8 seconds
Short answer: drain the thing that feeds the worker, not the worker itself. Assuming Prefect 3.x, here’s a practical runbook for blue/green or maintenance when using process or docker workers on the same machine. Two viable patterns A) Single pool/queue (simple drain/restart) - Stop the old worker from taking new work: - Pause the work pool: prefect work-pool pause <pool> Docs: https://docs-3.prefect.io/v3/concepts/work-pools CLI help: prefect work-pool pause --help - Alternatively, pause just the work queue(s) the worker polls: prefect work-queue pause <queue> --pool <pool> CLI help: prefect work-queue pause --help - Or set pool (or queue) concurrency to 0 to allow current runs to finish but start no new ones: - Pool: prefect work-pool set-concurrency-limit <pool> 0 CLI help: prefect work-pool set-concurrency-limit --help - Queue: prefect work-queue set-concurrency-limit <queue> --pool <pool> 0 - Wait for existing runs to finish: - Inspect the pool/queue: prefect work-pool inspect <pool> or prefect work-queue inspect <queue> --pool <pool> CLI help: prefect work-pool inspect --help, prefect work-queue inspect --help - Or list running flow runs: prefect flow-run ls --state RUNNING CLI help: prefect flow-run ls --help - In the UI, filter by Work Pool and state=Running; wait until zero. - Stop the old worker service (systemd, Docker, whatever you’re using). - Start the new worker (same pool/queue): - prefect worker start --pool <pool> [--work-queue <queue> ...] CLI help: prefect worker start --help - Resume the pool/queue if you paused it: - Pool: prefect work-pool resume <pool> - Queue: prefect work-queue resume <queue> --pool <pool> B) True blue/green (no contention, old only finishes existing) - Create a “green” destination (new pool or new queue): - New pool: prefect work-pool create <green-pool> --type <process|docker> - Or a new queue in the same pool: prefect work-queue create <green-queue> --pool <pool> - Start new workers pointing at the green pool/queue. - Update deployments to target the green pool/queue so all new flow runs route to green. Existing scheduled runs already enqueued to “blue” will stay there. - Pause the “blue” pool/queue (or set its concurrency to 0) so it stops taking more work; the blue workers will only finish what they already claimed. - Once blue shows no running flow runs, stop the blue workers and delete the old pool/queue when you’re ready. Notes by worker type - Process workers (local subprocesses): - If you terminate the worker process while a run is still executing, you risk interrupting the child process and crashing the flow. If you care about the run, drain first; don’t kill the worker until the pool/queue shows zero RUNNING. - There is no “prefect worker drain” or worker-level pause; you drain via the pool/queue. - Docker workers (containers per run): - Flows run in separate containers; those containers typically keep running even if the worker process exits. Still, the safest path is the same: pause/drain, then stop the old worker. - Docker how-to overview: https://docs-3.prefect.io/v3/how-to-guides/deployment_infra/docker Operational tips - Pause vs concurrency=0: - Both prevent new runs from starting. Setting concurrency to 0 is a soft drain; pause is a hard stop. Either is fine; pause is explicit. - Race window: - A worker may have just claimed a run right before you pause. Give it a few seconds, then re-check “Running” before pulling the plug. - Monitoring drain: - CLI: prefect work-pool inspect <pool>, prefect work-queue inspect <queue> --pool <pool>, and prefect flow-run ls --state RUNNING - UI: filter by work pool/queue and state=Running.
If you describe your current layout (one pool with one queue, multiple queues, or separate pools per environment), I can give you the exact commands to minimize downtime and surprise explosions. Such fun.
m
@Marvin This makes sense. I think the trick is how to coordinate the deployments to point to the right queue. And maybe this is simple enough for a blue/green scenario, but let's say I wanted to be able to scale from 1 to N workers. A queue-per-worker strategy could work from the worker side of things, but I'm not sure what to do for the deployments
m
thought for 259.1 seconds
You don’t need a queue-per-worker to scale. In fact, please don’t do that to yourself. How to think about it - Routing: Deployments route to a work pool, and optionally a specific work queue in that pool. - Scaling: You scale by adding more workers that poll the same pool/queue. No deployment changes required. - Isolation (blue/green, draining specific instances): Use separate queues or separate pools. That’s when deployments need to “point somewhere else”. Practical patterns 1) Simple scaling (1 → N workers) - Keep one pool and one queue (e.g., pool=prod, queue=default). - Start more workers pointing at that same pool/queue. They’ll fairly share the runs. - Process/Docker workers can be limited per instance with --limit to cap concurrency. - CLI help: prefect worker start --help - Your deployments do not change at all for scaling. - Docs - Workers: https://docs-3.prefect.io/v3/concepts/workers - Work pools/queues: https://docs-3.prefect.io/v3/concepts/work-pools 2) Blue/green or draining one set of workers while others continue - Create two queues in the same pool (blue, green). Old workers poll blue; new workers poll green. - Flip deployments to target the green queue. From that point, only green workers get new work. - Pause or set concurrency=0 on blue to drain, wait for “Running” to hit zero, then stop blue workers. - Pause pool: prefect work-pool pause <pool> - Pause queue: prefect work-queue pause <queue> --pool <pool> - Set pool concurrency to 0: prefect work-pool set-concurrency-limit <pool> 0 - Inspect: prefect work-pool inspect <pool> / prefect work-queue inspect <queue> --pool <pool> - List running: prefect flow-run ls --state RUNNING - CLI help links: - work-pool: prefect work-pool --help - work-queue: prefect work-queue --help - flow-run: prefect flow-run --help Coordinating deployments when you flip queues - Option A: Use the UI: edit the deployment’s Work Pool/Queue. Quick for a handful. - Option B: Redeploy with the new pool/queue in CI. Keep pool/queue names as variables so you don’t touch code: - Define pool/queue in your deploy config (prefect deploy) using environment variables/placeholders. - Your CI sets WORK_POOL=prod and WORK_QUEUE=green for the flip, then runs prefect deploy. - Check available flags with: prefect deploy --help - Option C: Point deployments only at a pool (no queue). Then: - For simple scaling: add/remove workers; nothing to flip. - For blue/green isolation you’ll still want separate queues (or separate pools). Otherwise both sets of workers will race for the same runs. Controlling capacity without moving deployments - Per-worker: start with a limit to cap concurrency for that instance - Example: prefect worker start --pool prod --work-queue default --limit 2 - Per-queue or per-pool: set concurrency limits so the whole lane is throttled - Pool: prefect work-pool set-concurrency-limit <pool> <n> - Queue: prefect work-queue set-concurrency-limit <queue> --pool <pool> <n> Summary - Scale: add workers to the same queue. Don’t touch deployments. - Isolate for blue/green: use separate queues (or pools), flip deployments once to the new queue, then drain the old queue. - Automate the flip by parameterizing pool/queue in your deploy config and letting CI set the values. If you share how many deployments you have and whether you prefer queues or separate pools for isolation, I’ll sketch the exact commands and a minimal CI snippet to toggle blue → green and back. I’ll try to contain my excitement.
m
@Marvin That Simple scaling (1 → N workers) example you gave makes perfect sense for scaling out, but how do you scale in? Say I now have 10 workers polling one pool/one queue, and I want to go down to 5 workers. What to do?
m
thought for 68.8 seconds
Great question. Scaling out is easy; scaling in without collateral damage takes a little choreography. In Prefect 3.x there’s no “prefect worker drain” command, so you do it via pool/queue controls and, if needed, a temporary “drain” lane. Two safe patterns A) Fastest: cap capacity, then retire workers - Set your global capacity to the new target (e.g., 5 total): - Per pool: prefect work-pool set-concurrency-limit <pool> 5 Help: prefect work-pool set-concurrency-limit --help - Or per queue: prefect work-queue set-concurrency-limit <queue> --pool <pool> 5 Help: prefect work-queue set-concurrency-limit --help - Wait until total Running <= 5: - CLI: prefect flow-run ls --state RUNNING Help: prefect flow-run ls --help - Or use the UI. - Stop any 5 worker processes/containers. - Docker workers: safe to stop immediately; flow containers keep running. - Process workers: only stop workers that are idle to avoid interrupting a subprocess run. If you’re not sure which are idle, see pattern B. B) Zero-interruption for process workers: use a temporary “drain” queue - Create a drain queue in the same pool: - prefect work-queue create drain --pool <pool> - Restart the 5 workers you want to remove so they point at the drain queue: - systemd/docker restart with: prefect worker start --pool <pool> --work-queue drain Help: prefect worker start --help - Deployments still point at your main queue, so these workers will not pick new runs. - Let any in-flight runs on those workers finish. When they’re idle, stop them. - Delete the drain queue when done if you like. Why this works - Routing is pool/queue-based. Concurrency limits and queue assignment control “who gets new runs” without touching the deployments. - Docker workers launch each flow in its own container; stopping the worker won’t kill the flow runs. Process workers execute flows as subprocesses; stopping a busy worker can interrupt that run—hence the drain queue trick or waiting for idleness. Optional refinements - If you know you’ll scale in/out a lot with process workers, consider running many workers with --limit 1 each; it’s easier to spot idle instances and retire them safely. - Prefer a queue-level limit if you have multiple queues in the pool; use pool-level if you want a hard cap across all queues. If you tell me whether you’re on process or docker workers and how you run them (systemd vs containers), I’ll give you the exact commands for your setup. I know, thrilling.