Henry Bruce
09/24/2025, 10:55 AMMarvin
09/24/2025, 10:55 AMMarvin
09/24/2025, 11:04 AMPREFECT_API_URL
must be the workspace-scoped URL (includes your account_id/workspace_id), not just https://api.prefect.cloud
- PREFECT_API_KEY
must be present and valid
- Ref: Connect to Cloud
- Worker is connected to a different workspace or polling a different work pool than the one you’re viewing in the UI
- Network egress blocked (VNet, firewall, DNS) or Cloud IP allowlist is enabled and does not include the worker’s egress IP
- You can check/manage allowlists with prefect cloud ip-allowlist ...
- The ACI container is running but the Prefect process isn’t (e.g., entrypoint not running prefect worker start ...
)
- Rare: running Prefect 2.x code/image by accident
Quick checks to run inside the worker container:
- Verify Prefect can see your Cloud workspace and credentials
prefect version
prefect config view
prefect cloud workspace ls
Expect PREFECT_API_URL
to look like:
<https://api.prefect.cloud/api/accounts/<account_id>/workspaces/<workspace_id>>
- Confirm the worker is started against the correct pool
ps aux | grep "prefect worker start"
It should be running something like:
prefect worker start --pool <your-azure-container-pool>
- If you use IP allowlists, see what’s configured:
prefect cloud ip-allowlist ls
- From inside the container, quickly test reachability
curl -sS -H "Authorization: Bearer $PREFECT_API_KEY" "$PREFECT_API_URL/health"
Should return a simple OK/healthy response.
Notes:
- The healthcheck server (--with-healthcheck
) is optional and for container orchestration liveness/readiness; the Cloud “Online/Offline” badge is based on recent polls/heartbeats, not that endpoint. Details: Worker healthchecks
- Azure container worker docs: Azure Container worker
If you can share:
- The exact command/entrypoint you use to start the worker
- The PREFECT_API_URL
value inside the container (you can redact IDs if you prefer)
- Whether your Cloud account uses an IP allowlist
- The work pool name/type you expect the worker to poll
…I can pinpoint the root cause quickly.Henry Bruce
09/24/2025, 11:49 AMMarvin
09/24/2025, 11:49 AMMarvin
09/24/2025, 11:49 AM--name
, Prefect generates a new worker name on each start. Consider setting --name
so Cloud shows the same worker identity after restarts.
If your goal is to fix the Offline status: a restart will only help if the problem was transient. If the root cause is credentials or networking, it’ll come back offline. After restart, verify inside the container:
prefect config view
prefect cloud workspace ls
Ensure:
- PREFECT_API_URL
is your workspace-scoped URL (includes your account_id/workspace_id)
- PREFECT_API_KEY
is set and valid
- The worker is started against the expected pool:
prefect worker start --pool <your-azure-container-pool> --name <your-fixed-name>
Relevant docs:
- Azure Container worker: Azure Container worker
- Worker health/online status: Worker healthchecks
If you want, share your worker start command/entrypoint and the PREFECT_API_URL
format you see in the container, and I can double-check it’ll come back online after the restart.Henry Bruce
09/24/2025, 11:58 AMMarvin
09/24/2025, 11:58 AMMarvin
09/24/2025, 12:01 PMprefecthq/prefect:3
, 3-latest
, or latest
, ACI will pull the newest image for that tag on container start/restart. That’s the most common cause of “it upgraded when I restarted.”
- Startup scripts that upgrade Prefect: If your container entrypoint or bootstrap runs something like pip install -U "prefect>=3,<4"
(or just pip install -U prefect
), it will pull a newer patch on each restart.
- Managed workers only: The only case where Prefect controls the worker version is the “prefect:managed” work pool type. Your Azure Container worker is not managed by Prefect Cloud, so this does not apply here.
How to prevent surprise upgrades (recommended)
- Pin immutable image tags for the worker container:
- Prefer something like prefecthq/prefect:3.4.19-python3.11
(or your exact build) instead of :3
or :latest
.
- Even better, use a digest (@sha256:...
) for fully immutable pulls.
- Remove/avoid any pip install -U prefect
in the worker’s entrypoint.
- Rebuild and redeploy intentionally when you want to move versions.
About worker vs runner version “conflicts”
- The worker and the flow-run (runner) containers can run different Prefect versions, but keeping them aligned avoids oddities.
- Best practice: pin both worker and flow-run images to the same Prefect x.y.z version. At minimum, keep them on the same minor version (x.y).
- Mismatches can show up as:
- Job template/variable keys that one side doesn’t recognize
- Differences in how the run is started or monitored
- Verify with:
# inside worker container
prefect version
# inside the flow-run image
prefect version
- If you’re using floating tags for the worker but a pinned Prefect version for the runner, you can end up with exactly the “unexplained conflict” you saw.
If you share:
- The worker container image tag (and whether ACI is pulling from a tag like :3
or :latest
)
- Any entrypoint snippet that installs/updates Prefect at startup
- The flow-run image’s Prefect version pin
…I can suggest concrete changes to make upgrades fully predictable.
Helpful docs:
- Azure Container worker: Azure Container worker
- Worker concepts: WorkersHenry Bruce
09/24/2025, 12:28 PMaz container create --name una-acp-qa-prefect-3-3-4-workpool-hybrid --resource-group una-acp-qa-prefect-workpool-rg --assign-identity <identity> --image "prefecthq/prefect:3.3.4-python3.12" --secure-environment-variables PREFECT_API_URL=<URL> PREFECT_API_KEY=<KEY> --command-line "/bin/bash -c 'pip install prefect-azure && prefect worker start --pool una-acp-qa-prefect-3-3-4-workpool-hybrid --type azure-container-instance'" --registry-username <user> --registry-password <pass> --registry-login-server <http://index.docker.io|index.docker.io>
Marvin
09/24/2025, 12:28 PMMarvin
09/24/2025, 12:29 PM/bin/bash -c 'pip install prefect-azure && prefect worker start --pool ... --type azure-container-instance'
- prefect-azure
does not pin prefect
to your image’s version. Its dependency spec allows newer Prefect, so pip resolves the latest compatible release and upgrades prefect
inside the container on every restart.
- That explains why you saw 3.4.15 before, then 3.4.19 after a restart, even though the image tag is prefecthq/prefect:3.3.4-python3.12
.
How to stop surprise upgrades
- Best: bake dependencies into the image and don’t run pip at startup.
- Build your own image FROM prefecthq/prefect:<x.y.z>-python<py>
and add:
pip install "prefect==<x.y.z>" "prefect-azure==<a.b.c>"
- Then run only:
prefect worker start --pool ... --type azure-container-instance
- If you must install at startup, pin explicitly and/or add constraints:
pip install "prefect==3.4.19" "prefect-azure==<a.b.c>"
# or keep minor fixed:
pip install "prefect>=3.4.0,<3.5" "prefect-azure>=0.4,<0.5"
Note: the range form can still drift patch versions; use exact pins if you want zero drift.
- Alternatively, use digest-pinned images and remove pip entirely.
Why you saw worker/runner mismatches
- Your runner image is pegged, but the worker upgraded itself via pip install prefect-azure
. This can cause incompatibilities. Pin both worker and runner to the same Prefect version (ideally exact x.y.z) and the matching prefect-azure
version.
If you’d like, share the prefect-azure
version you intend to use and the Prefect version in your runner image; I’ll suggest concrete pins and a minimal Dockerfile/ACI command to keep them perfectly aligned.
Docs for reference:
- Azure Container worker: Azure Container worker
- Worker concepts: WorkersHenry Bruce
09/24/2025, 12:31 PMMarvin
09/24/2025, 12:31 PMMarvin
09/24/2025, 12:31 PMbash -c 'pip install prefect-azure && prefect worker start ...'
- If the pip step fails (network hiccup, rate limit, dependency resolution), the &&
prevents prefect worker start
and the shell exits non-zero → container restarts.
- Unhandled exceptions in the worker process: e.g., invalid PREFECT_API_KEY/URL, permission issues with the work pool, etc. The process exits → restart.
- OOM kill or resource pressure: If memory/CPU is too tight, the process can be killed and restarted.
- Platform events: ACI host maintenance or rescheduling can restart the container.
- Liveness probes (if configured): A failing probe will cause restarts. Your CLI snippet didn’t include probes, so likely not this.
What to check in Azure to confirm cause
- Restart policy and events
az container show -g <rg> -n <name> --query "{restartPolicy:restartPolicy, events:containers[0].instanceView.events}"
- Exit codes and previous state
az container show -g <rg> -n <name> --query "{current:containers[0].instanceView.currentState, previous:containers[0].instanceView.previousState}"
- Logs around the restart time for pip or OOM/Killed messages
az container logs -g <rg> -n <name> --follow
How to reduce surprise restarts
- Remove pip from startup and bake dependencies into the image (best practice)
- Build FROM prefecthq/prefect:<x.y.z>-python<py>
and pip install "prefect==<x.y.z>" "prefect-azure==<a.b.c>"
, then run only:
prefect worker start --pool ... --type azure-container-instance
- If you must install at startup, pin and add retries:
pip install --no-cache-dir --retries 5 "prefect==<x.y.z>" "prefect-azure==<a.b.c>" || exit 1
prefect worker start --pool ... --type azure-container-instance
- Give the container enough memory/CPU to avoid OOM.
- Set PREFECT_LOGGING_LEVEL=DEBUG
to capture more detail if the worker exits.
TL;DR: The worker won’t choose to restart itself; container restarts are driven by ACI because the process exited. Given your command includes a runtime pip install, transient failures there are a prime suspect. Baking dependencies into the image will make restarts far less frequent and will stop version drift entirely.
References:
- Workers: Workers
- Azure Container worker: Azure Container workerHenry Bruce
09/24/2025, 12:33 PMMarvin
09/24/2025, 12:33 PMMarvin
09/24/2025, 12:33 PMaz container create
, so any restarts/rescheduling are driven by ACI:
- Process exited (your bash/pip step failed or the worker crashed) and restartPolicy=Always brought it back
- OOM/kill or host maintenance
- Transient platform/network issues during image pulls or startup
To confirm cause in Azure:
az container show -g <rg> -n <name> --query "{restartPolicy:restartPolicy, events:containers[0].instanceView.events, current:containers[0].instanceView.currentState, previous:containers[0].instanceView.previousState}"
Look for non-zero exit codes, “Killed”/OOM, or maintenance events around the restart time.
Only exception: if you were using a Prefect-managed work pool (prefect:managed
), Prefect provisions and operates the infra; in that scenario Prefect could restart/replace worker pods. But that’s not the case for an Azure Container Instance worker you run yourself.
References:
- Workers overview: Workers
- Azure Container worker: Azure Container worker