<@ULVA73B9P> i have deployed a `.serve` container ...
# ask-marvin
a
@Marvin i have deployed a
.serve
container on gcp and its up and running, it even receives jobs from prefect. however the job goes into pending state: 051852.286 | INFO | prefect.flow_runs.runner - Runner 'runner-f74498f5-302d-48bf-9b02-7d5776d3026e' submitting flow run '06896d9e-b555-7ae4-8000-b3cb46bf7951' 2025-08-09 104852.436 IST 051852.435 | INFO | prefect.flow_runs.runner - Opening process... 2025-08-09 104852.468 IST 051852.468 | INFO | prefect.flow_runs.runner - Completed submission of flow run '06896d9e-b555-7ae4-8000-b3cb46bf7951' the above are the logs
m
thought for 160.4 seconds
Thanks for the logs — they’re the classic “everything looks fine, which is usually when it isn’t.” What those lines mean: - The .serve runner received the flow run, spawned a subprocess (“Opening process…”), and told the API the submission is complete. - The run only moves from Pending to Running when that subprocess starts up and reports its first state. If it never does, the run sits in Pending. Common causes in a .serve container (especially on Cloud Run): - Flow code isn’t available to the subprocess - Image doesn’t contain your code, or you’re using
from_source(...)
but the container can’t pull the repo (git not installed, no network egress, or private repo without credentials). - Missing/wrong Prefect API env in the subprocess -
PREFECT_API_URL
and
PREFECT_API_KEY
must be set and reachable; the subprocess inherits env from the container. If they’re wrong, it can’t report “Running”. - Cloud Run CPU throttling background work - If “CPU is only allocated during request processing,” the container will go to sleep between HTTP requests. A .serve runner needs CPU for background polling and for the spawned subprocess. Set CPU to “always allocated” and consider a min instance to keep it warm. - Import/entrypoint errors - The subprocess can’t import your flow module or dependencies (ModuleNotFoundError), so it dies before sending any state. - No outbound network to fetch code or data - VPC egress/Firewall blocks GitHub/GCS/S3/etc. If using
from_source
, the first thing the subprocess does is fetch code. Quick checks to pinpoint it: 1) In the Prefect UI, open the stuck flow run and check the “flow run” logs (not just the runner logs). If they’re empty, the subprocess never started or immediately died before logging. 2) Turn up logging on the container:
Copy code
PREFECT_LOGGING_LEVEL=DEBUG
3) Verify env inside the container:
Copy code
echo "$PREFECT_API_URL"
test -n "$PREFECT_API_KEY" && echo "API key present" || echo "API key missing"
prefect version
4) If you’re using from_source, confirm the container can pull the repo:
Copy code
git --version
git ls-remote <https://github.com/<owner>/<repo>.git>  # or SSH, with creds set
5) Sanity check Python can import your flow code in that image:
Copy code
python -c "import your_package_or_module; print('import ok')"
6) Cloud Run settings: - CPU: set to “always allocated” - Min instances: > 0 (optional but helpful) - Egress: allow outbound to Prefect Cloud and any code storage (GitHub/Git, artifact storage) If you can share: - How you built and started the container (Dockerfile/entrypoint) - Whether you used
flow.serve()
with
.from_source(...)
or bundled code in the image - Cloud Run CPU/min-instances settings - Whether your repo is private …I can give you a more targeted fix. References: - Serve flows in Docker (Prefect 3) - Schedule your code quickly with flow.serve - Workers and runners overview (for background on how runs transition states) My first bets for Cloud Run: CPU not always allocated or the subprocess can’t import your flow code (missing code or git creds). Let me know which path you’re on and we’ll zero in.
a
@Marvin Failed to start process for flow run '06896dcd-fe9a-7afd-8000-67c33b0ffd5d'. Traceback (most recent call last): File "/opt/venv/lib/python3.12/site-packages/prefect/runner/runner.py", line 1294, in _submit_run_and_capture_errors process = await self._run_process( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/prefect/runner/runner.py", line 756, in _run_process env.update(get_current_settings().to_environment_variables(exclude_unset=True)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/prefect/settings/base.py", line 98, in to_environment_variables env: dict[str, Any] = self.model_dump( ^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/pydantic/main.py", line 463, in model_dump return self.__pydantic_serializer__.to_python( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pydantic_core._pydantic_core.PydanticSerializationError: Error calling function `ser_model`: SystemExit: 0
m
thought for 277.9 seconds
Ah, the elusive “Pending forever” turns out to be a subprocess that never actually starts. Your traceback nails it: The runner tries to spawn a child process and inject the current Prefect settings via environment variables: - It calls
get_current_settings().to_environment_variables(...)
- That triggers Pydantic’s model serialization - Pydantic raises PydanticSerializationError because something called
SystemExit(0)
during serialization SystemExit(0) is what CLIs use for a clean exit (e.g., printing help). In other words: while turning your settings model into env vars for the child process, something in the “CLI settings” path likely decided it should exit successfully. Then the process never starts, and the run never moves past Pending. Most likely causes - A CLI parsing hook got triggered in the settings model - For example, something in
sys.argv
looks like
--help
, or a CLI-related setting is being parsed in a way that exits cleanly. - An environment variable that influences CLI behavior is set to an unexpected value (e.g., a
PREFECT_CLI_*
var). - A bug in a specific Prefect version’s settings serialization with Python 3.12. Quick fixes/workarounds to try - Disable CLI-arg parsing in settings for the runner process: - Add these env vars to your Cloud Run service and redeploy:
Copy code
PREFECT_CLI_PARSE_ARGS=false
    PREFECT_CLI_EXIT_ON_ERROR=false
    PREFECT_CLI_IGNORE_UNKNOWN_ARGS=true
- These map to the CLI settings fields visible in Prefect’s settings models and should prevent any CLI parse path from calling
SystemExit
. - Upgrade Prefect to the latest 3.x in your image and redeploy. If this is a version-specific bug, a newer release may already have fixed it:
Copy code
pip install -U "prefect>=3.0.0"
- Turn on debug logs to surface what the runner is doing just before the crash:
Copy code
PREFECT_LOGGING_LEVEL=DEBUG
- Sanity check the command/entrypoint and argv inside your container - Make sure your container’s ENTRYPOINT/CMD doesn’t include
--help
or any stray flags - If you can exec into a running instance, print argv and attempt a settings dump:
Copy code
python - <<'PY'
    import sys
    from prefect.settings import get_current_settings
    print("argv:", sys.argv)
    try:
        s = get_current_settings()
        d = s.model_dump(exclude_unset=True)
        print("settings_dump_ok:", len(d))
    except Exception:
        import traceback; traceback.print_exc()
    PY
Other checks that commonly bite .serve on Cloud Run - Ensure CPU is “always allocated” for the Cloud Run service (a .serve runner needs background CPU). - Verify
PREFECT_API_URL
and
PREFECT_API_KEY
are set and the service has outbound egress to Prefect Cloud. If you can share: - Prefect version (
prefect version
output) - Your Dockerfile + the exact ENTRYPOINT/CMD used - Any
PREFECT_CLI_*
env vars you’ve set …I can give you a targeted fix instead of waving a stick at Pydantic. References, in case you like reading as much as I like brooding: - Serve flows in Docker (Prefect 3) - Workers/runners overview
a
@Marvin what are the infrastructure config required for a worker on gcp cloud run, do i need more than 1cpu
m
thought for 175.3 seconds
Short answer: a Prefect worker itself is lightweight. On Cloud Run, 1 vCPU is usually plenty. Spend your CPUs on the Cloud Run Jobs that actually execute your flows. What to provision for a worker on Cloud Run - Instance shape: - CPU: 1 vCPU is sufficient for most workers (they just poll and submit jobs). - Memory: 512 MB to 1 GiB is typically fine. - Cloud Run service settings: - CPU allocation: set to “CPU always allocated” so the worker can poll even without HTTP traffic. - Min instances: at least 1 to keep the worker up. - Concurrency: 1 (the worker isn’t serving HTTP; low concurrency avoids contention). - Env vars: -
PREFECT_API_URL
and
PREFECT_API_KEY
(and any other settings you need). - The worker command will specify the work pool:
prefect worker start -p <your-cloud-run-pool>
. - Networking: - Outbound egress to Prefect Cloud and to GCP APIs. - VPC connector if you need private egress. - Identity/permissions (service account attached to the worker service): - roles/run.admin (create/launch Cloud Run Jobs) - roles/iam.serviceAccountUser (to use the job’s service account) - roles/artifactregistry.reader (pull private images, if applicable) - roles/logging.logWriter (write logs) - roles/vpcaccess.user (if using a VPC connector) How the CPUs actually matter - Worker container: 1 vCPU is fine unless you’re submitting hundreds of runs per second or doing heavy post-processing in the worker. - Flow runs: sized separately in your Cloud Run work pool’s base job template. That’s where you set CPU/memory per run. If your flow needs >1 vCPU, set it there; it won’t affect the worker service. Recommended setup flow 1) Create a Cloud Run work pool:
Copy code
prefect work-pool create my-cloud-run --type cloud-run
2) Inspect the default job template (to see CPU/memory fields you can tune):
Copy code
prefect work-pool get-default-base-job-template --type cloud-run
3) Deploy a worker container to Cloud Run: - Command:
prefect worker start -p my-cloud-run
- CPU always allocated, min instances 1, concurrency 1 - Attach a service account with the roles above - Set
PREFECT_API_URL
and
PREFECT_API_KEY
Docs worth having open: - Cloud Run worker template variables: run
prefect work-pool get-default-base-job-template --type cloud-run
- Guide: GCP worker guide - Background: Workers and work pools If you tell me: - Whether the worker will run on Cloud Run or elsewhere (GCE/GKE) - Your expected run throughput and flow resource needs …I’ll suggest concrete CPU/mem numbers for both the worker service and the Cloud Run Job template. For many teams: worker = 1 vCPU/512MB; jobs = 1–2 vCPU, 1–4 GiB depending on workload.