Amith M
08/09/2025, 5:22 AM.serve
container on gcp and its up and running, it even receives jobs from prefect. however the job goes into pending state:
051852.286 | INFO | prefect.flow_runs.runner - Runner 'runner-f74498f5-302d-48bf-9b02-7d5776d3026e' submitting flow run '06896d9e-b555-7ae4-8000-b3cb46bf7951'
2025-08-09 104852.436 IST
051852.435 | INFO | prefect.flow_runs.runner - Opening process...
2025-08-09 104852.468 IST
051852.468 | INFO | prefect.flow_runs.runner - Completed submission of flow run '06896d9e-b555-7ae4-8000-b3cb46bf7951'
the above are the logsMarvin
08/09/2025, 5:23 AMMarvin
08/09/2025, 5:25 AMfrom_source(...)
but the container can’t pull the repo (git not installed, no network egress, or private repo without credentials).
- Missing/wrong Prefect API env in the subprocess
- PREFECT_API_URL
and PREFECT_API_KEY
must be set and reachable; the subprocess inherits env from the container. If they’re wrong, it can’t report “Running”.
- Cloud Run CPU throttling background work
- If “CPU is only allocated during request processing,” the container will go to sleep between HTTP requests. A .serve runner needs CPU for background polling and for the spawned subprocess. Set CPU to “always allocated” and consider a min instance to keep it warm.
- Import/entrypoint errors
- The subprocess can’t import your flow module or dependencies (ModuleNotFoundError), so it dies before sending any state.
- No outbound network to fetch code or data
- VPC egress/Firewall blocks GitHub/GCS/S3/etc. If using from_source
, the first thing the subprocess does is fetch code.
Quick checks to pinpoint it:
1) In the Prefect UI, open the stuck flow run and check the “flow run” logs (not just the runner logs). If they’re empty, the subprocess never started or immediately died before logging.
2) Turn up logging on the container:
PREFECT_LOGGING_LEVEL=DEBUG
3) Verify env inside the container:
echo "$PREFECT_API_URL"
test -n "$PREFECT_API_KEY" && echo "API key present" || echo "API key missing"
prefect version
4) If you’re using from_source, confirm the container can pull the repo:
git --version
git ls-remote <https://github.com/<owner>/<repo>.git> # or SSH, with creds set
5) Sanity check Python can import your flow code in that image:
python -c "import your_package_or_module; print('import ok')"
6) Cloud Run settings:
- CPU: set to “always allocated”
- Min instances: > 0 (optional but helpful)
- Egress: allow outbound to Prefect Cloud and any code storage (GitHub/Git, artifact storage)
If you can share:
- How you built and started the container (Dockerfile/entrypoint)
- Whether you used flow.serve()
with .from_source(...)
or bundled code in the image
- Cloud Run CPU/min-instances settings
- Whether your repo is private
…I can give you a more targeted fix.
References:
- Serve flows in Docker (Prefect 3)
- Schedule your code quickly with flow.serve
- Workers and runners overview (for background on how runs transition states)
My first bets for Cloud Run: CPU not always allocated or the subprocess can’t import your flow code (missing code or git creds). Let me know which path you’re on and we’ll zero in.Amith M
08/09/2025, 5:31 AMMarvin
08/09/2025, 5:31 AMMarvin
08/09/2025, 5:36 AMget_current_settings().to_environment_variables(...)
- That triggers Pydantic’s model serialization
- Pydantic raises PydanticSerializationError because something called SystemExit(0)
during serialization
SystemExit(0) is what CLIs use for a clean exit (e.g., printing help). In other words: while turning your settings model into env vars for the child process, something in the “CLI settings” path likely decided it should exit successfully. Then the process never starts, and the run never moves past Pending.
Most likely causes
- A CLI parsing hook got triggered in the settings model
- For example, something in sys.argv
looks like --help
, or a CLI-related setting is being parsed in a way that exits cleanly.
- An environment variable that influences CLI behavior is set to an unexpected value (e.g., a PREFECT_CLI_*
var).
- A bug in a specific Prefect version’s settings serialization with Python 3.12.
Quick fixes/workarounds to try
- Disable CLI-arg parsing in settings for the runner process:
- Add these env vars to your Cloud Run service and redeploy:
PREFECT_CLI_PARSE_ARGS=false
PREFECT_CLI_EXIT_ON_ERROR=false
PREFECT_CLI_IGNORE_UNKNOWN_ARGS=true
- These map to the CLI settings fields visible in Prefect’s settings models and should prevent any CLI parse path from calling SystemExit
.
- Upgrade Prefect to the latest 3.x in your image and redeploy. If this is a version-specific bug, a newer release may already have fixed it:
pip install -U "prefect>=3.0.0"
- Turn on debug logs to surface what the runner is doing just before the crash:
PREFECT_LOGGING_LEVEL=DEBUG
- Sanity check the command/entrypoint and argv inside your container
- Make sure your container’s ENTRYPOINT/CMD doesn’t include --help
or any stray flags
- If you can exec into a running instance, print argv and attempt a settings dump:
python - <<'PY'
import sys
from prefect.settings import get_current_settings
print("argv:", sys.argv)
try:
s = get_current_settings()
d = s.model_dump(exclude_unset=True)
print("settings_dump_ok:", len(d))
except Exception:
import traceback; traceback.print_exc()
PY
Other checks that commonly bite .serve on Cloud Run
- Ensure CPU is “always allocated” for the Cloud Run service (a .serve runner needs background CPU).
- Verify PREFECT_API_URL
and PREFECT_API_KEY
are set and the service has outbound egress to Prefect Cloud.
If you can share:
- Prefect version (prefect version
output)
- Your Dockerfile + the exact ENTRYPOINT/CMD used
- Any PREFECT_CLI_*
env vars you’ve set
…I can give you a targeted fix instead of waving a stick at Pydantic.
References, in case you like reading as much as I like brooding:
- Serve flows in Docker (Prefect 3)
- Workers/runners overviewAmith M
08/09/2025, 6:31 AMMarvin
08/09/2025, 6:31 AMMarvin
08/09/2025, 6:34 AMPREFECT_API_URL
and PREFECT_API_KEY
(and any other settings you need).
- The worker command will specify the work pool: prefect worker start -p <your-cloud-run-pool>
.
- Networking:
- Outbound egress to Prefect Cloud and to GCP APIs.
- VPC connector if you need private egress.
- Identity/permissions (service account attached to the worker service):
- roles/run.admin (create/launch Cloud Run Jobs)
- roles/iam.serviceAccountUser (to use the job’s service account)
- roles/artifactregistry.reader (pull private images, if applicable)
- roles/logging.logWriter (write logs)
- roles/vpcaccess.user (if using a VPC connector)
How the CPUs actually matter
- Worker container: 1 vCPU is fine unless you’re submitting hundreds of runs per second or doing heavy post-processing in the worker.
- Flow runs: sized separately in your Cloud Run work pool’s base job template. That’s where you set CPU/memory per run. If your flow needs >1 vCPU, set it there; it won’t affect the worker service.
Recommended setup flow
1) Create a Cloud Run work pool:
prefect work-pool create my-cloud-run --type cloud-run
2) Inspect the default job template (to see CPU/memory fields you can tune):
prefect work-pool get-default-base-job-template --type cloud-run
3) Deploy a worker container to Cloud Run:
- Command: prefect worker start -p my-cloud-run
- CPU always allocated, min instances 1, concurrency 1
- Attach a service account with the roles above
- Set PREFECT_API_URL
and PREFECT_API_KEY
Docs worth having open:
- Cloud Run worker template variables: run prefect work-pool get-default-base-job-template --type cloud-run
- Guide: GCP worker guide
- Background: Workers and work pools
If you tell me:
- Whether the worker will run on Cloud Run or elsewhere (GCE/GKE)
- Your expected run throughput and flow resource needs
…I’ll suggest concrete CPU/mem numbers for both the worker service and the Cloud Run Job template. For many teams: worker = 1 vCPU/512MB; jobs = 1–2 vCPU, 1–4 GiB depending on workload.