Riya Sinha
09/05/2025, 2:45 AMFlow run infrastructure exited with non-zero status code:
Task ludicrous-zebu-m6wlm-task0 failed with message: The container exited with an error. (Error Code: 1)
no cloud run job is being created so i cant see any more info about whats actually wrong. how do i debug thisMarvin
09/05/2025, 2:45 AMMarvin
09/05/2025, 2:54 AMexport PREFECT_LOGGING_LEVEL=DEBUG
# ensure the GCP extras are available to the worker
uv run --with 'prefect[gcp]' prefect worker start -p <your-cloud-run-pool>
- Re-trigger the flow. You should see steps like “building image…”, “pushing image…”, or an error from the Cloud Run Jobs API.
2) Verify Docker and registry auth on the worker host
- Confirm Docker is available:
docker version
docker info
- Authenticate Docker to Artifact Registry (pick one):
- With gcloud:
gcloud auth configure-docker <region>-docker.pkg.dev
- With a JSON key:
cat key.json | docker login -u _json_key --password-stdin https://<region>-docker.pkg.dev
3) Validate the work pool config
- Make sure the pool type is Cloud Run and your base job template has correct values for region, project, image URL, service account, etc. You can inspect the pool:
prefect work-pool inspect "<your-cloud-run-pool>"
- If you want the default Cloud Run template for comparison:
uv run --with 'prefect[gcp]' prefect work-pool get-default-base-job-template --type cloud-run
4) Check IAM for the service account used by the worker/job
At minimum, ensure the SA creating the job and pushing images has:
- roles/artifactregistry.writer on the target repo
- roles/run.admin
- roles/iam.serviceAccountUser (on the runtime SA specified in the job)
If logs routing is involved, ensure logging permissions as needed.
5) Isolate outside Prefect to pinpoint the failure
- Try pushing a test image to your Artifact Registry from the same machine:
docker build -t <region>-docker.pkg.dev/<project>/<repo>/prefect-test:tmp - <<'EOF'
FROM <http://gcr.io/distroless/python3|gcr.io/distroless/python3>
CMD ["python","-c","print('ok')"]
EOF
docker push <region>-docker.pkg.dev/<project>/<repo>/prefect-test:tmp
- Try creating/executing a Cloud Run Job manually:
gcloud run jobs create prefect-test --image <region>-docker.pkg.dev/<project>/<repo>/prefect-test:tmp --region <region> --project <project>
gcloud run jobs execute prefect-test --region <region> --project <project> --wait
If either of these fails, it’s a GCP/Docker/permissions issue rather than Prefect.
6) Make sure you’re looking at the right worker
If you have multiple workers, confirm the one you’re tailing picked up the run (give it a unique name and restrict it to the exact work queue).
Notes:
- @flow(log_prints=True)
won’t help here because your flow never starts — you need worker logs.
- If you’re running the worker in Docker/K8s/systemd, use docker logs ...
/ kubectl logs ...
/ journalctl -u ...
to view logs.
Helpful docs:
- GCP worker guide (Cloud Run)
- Troubleshooting deployments/runs
- Logging customization
If you can share:
- How you started the worker (command and where it’s running)
- Your pool type and a redacted base job template (image URL, region, project)
- Any worker logs around the failure
…I can help pinpoint the failing step quickly. Are you on Prefect 3.x, and is your worker running on a machine with Docker installed and GCP auth configured?Riya Sinha
09/05/2025, 3:12 AMMarvin
09/05/2025, 3:12 AMMarvin
09/05/2025, 3:14 AMdeployments:
- name: prod
entrypoint: flows/my_flow.py:my_flow
work_pool:
name: my-cloud-run-pool
job_variables:
image: us-central1-docker.pkg.dev/<project>/<repo>/<image>:<tag>
region: us-central1
project: <project>
service_account: my-sa@<project>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>
Tip: If you’re using a prebuilt image, don’t include any build/push steps in your project; Prefect will just use the image you set. And if something fails before the Job is created, check the worker logs in DEBUG to see the exact step that’s failing.
Docs:
- GCP worker guide (Cloud Run)Riya Sinha
09/05/2025, 3:31 AM{
"image": "***/vsp-flows:latest",
"credentials": {
"$ref": {
"block_document_id": "bca402ce-58e3-4be3-a60a-e774af172c1f"
}
}
}
where the block is gcp credentails service account info with value:
{
"type": "********",
"auth_uri": "********",
"client_id": "********",
"token_uri": "********",
"project_id": "********",
"private_key": "********",
"client_email": "********",
"private_key_id": "********",
"universe_domain": "********",
"client_x509_cert_url": "********",
"auth_provider_x509_cert_url": "********"
}
is taht right alsoMarvin
09/05/2025, 3:31 AMMarvin
09/05/2025, 3:32 AMPREFECT_LOGGING_LEVEL=DEBUG
in the environment where the worker runs and start the worker in the foreground:
export PREFECT_LOGGING_LEVEL=DEBUG
uv run --with 'prefect[gcp]' prefect worker start -p <your-cloud-run-pool>
- Then re-trigger the flow run and watch the worker output. If the failure is during image resolution, credentials usage, or Cloud Run job creation, you’ll see a more specific error there.
- If your worker is running in a container or another process manager, add PREFECT_LOGGING_LEVEL=DEBUG
to that container’s env and check its logs (docker logs
, kubectl logs
, etc.).
About your job variables and credentials block
- Yes, referencing a GcpCredentials
block in job_variables.credentials
is correct for the Cloud Run worker. That block should contain valid service account info (what you pasted looks structurally right).
- However, for Cloud Run you typically also need to specify project
and region
(and often a service_account
for the runtime) — either in the work pool’s base job template or in the deployment’s job_variables
. If those aren’t present anywhere, the worker will fail before creating the job (matching the error you’re seeing).
Sanity check (things to confirm)
1) Image is fully qualified to Artifact Registry (including region):
<region>-docker.pkg.dev/<project>/<repo>/<image>:<tag>
2) Project/region are set (pool template OR deployment job variables).
3) Optional but common: set the runtime service account:
service_account: <sa-name>@<project>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>
4) The service account in your GcpCredentials block has:
- roles/run.admin
- roles/iam.serviceAccountUser (on the runtime SA)
- roles/artifactregistry.writer (for the repo you pull/push)
- logging permissions as needed
Example deployment snippet using a prebuilt image
deployments:
- name: prod
entrypoint: flows/my_flow.py:my_flow
work_pool:
name: my-cloud-run-pool
job_variables:
image: us-central1-docker.pkg.dev/my-proj/my-repo/vsp-flows:1.2.3
project: my-proj
region: us-central1
service_account: <mailto:my-cloud-run-sa@my-proj.iam.gserviceaccount.com|my-cloud-run-sa@my-proj.iam.gserviceaccount.com>
credentials:
$ref:
block_document_id: bca402ce-58e3-4be3-a60a-e774af172c1f
Note: Prefer a pinned tag over :latest
to avoid ambiguity.
Useful CLI checks
- Inspect your pool to see what’s already set in the base job template:
prefect work-pool inspect "<your-cloud-run-pool>"
- Dump the default Cloud Run base job template to see required/optional fields:
uv run --with 'prefect[gcp]' prefect work-pool get-default-base-job-template --type cloud-run
If you can share:
- Your pool’s inspect
output (redacted)
- Your worker startup method and any DEBUG logs around the failure
- The fully qualified image name you’re using
…I can spot what’s missing quickly.
Docs:
- GCP worker guide (Cloud Run): https://docs-3.prefect.io/integrations/prefect-gcp/gcp-worker-guide
- Troubleshooting deployments/runs: https://docs-3.prefect.io/v3/how-to-guides/cloud/troubleshoot-cloud
- Logging customization: https://docs-3.prefect.io/advanced/logging-customizationRiya Sinha
09/05/2025, 3:57 AM{
"cpu": {
"type": "string",
"title": "CPU",
"default": "1000m",
"example": "1000m",
"pattern": "^(\\d*000)m$",
"description": "The amount of compute allocated to the Cloud Run Job. (1000m = 1 CPU)."
},
"env": {
"type": "object",
"title": "Environment Variables",
"default": {
"DEPLOYMENT_TYPE": "cloud",
"PYTHONUNBUFFERED": "1",
"PREFECT_LOGGING_LEVEL": "DEBUG"
},
"description": "Environment variables to set when starting a flow run.",
"additionalProperties": {
"type": "string"
}
},
"name": {
"type": "string",
"title": "Name",
"description": "Name given to created infrastructure."
},
"image": {
"type": "string",
"title": "Image Name",
"default": "docker.io/prefecthq/prefect:3-latest",
"example": "docker.io/prefecthq/prefect:3-latest",
"description": "The image to use for a new Cloud Run Job. If not set, the latest Prefect image will be used."
},
"labels": {
"type": "object",
"title": "Labels",
"description": "Labels applied to created infrastructure.",
"additionalProperties": {
"type": "string"
}
},
"memory": {
"type": "string",
"title": "Memory",
"default": "1Gi",
"example": "512Mi",
"pattern": "^\\d+(?:G|Gi|M|Mi)$",
"description": "The amount of memory allocated to the Cloud Run Job. Must be specified in units of 'G', 'Gi', 'M', or 'Mi'."
},
"region": {
"type": "string",
"title": "Region",
"default": "us-west1",
"example": "us-west1",
"description": "The region where the Cloud Run Job resides."
},
"command": {
"type": "string",
"title": "Command",
"description": "The command to use when starting a flow run. In most cases, this should be left blank and the command will be automatically generated."
},
"timeout": {
"type": "integer",
"title": "Job Timeout",
"default": 600,
"maximum": 3600,
"description": "The length of time that Prefect will wait for Cloud Run Job state changes.",
"exclusiveMinimum": 0
},
"keep_job": {
"type": "boolean",
"title": "Keep Job After Completion",
"default": false,
"description": "Keep the completed Cloud Run Job after it has run."
},
"credentials": {
"title": "GCP Credentials",
"description": "The GCP Credentials used to initiate the Cloud Run Job. `service_account_info` is required.",
"anyOf": [
{
"type": "object",
"title": "GcpCredentials",
"properties": {
"project": {
"type": "string",
"title": "Project",
"description": "The GCP project to use for the client."
},
"service_account_file": {
"type": "string",
"title": "Service Account File",
"format": "path",
"description": "Path to the service account JSON keyfile."
},
"service_account_info": {
"type": "object",
"title": "Service Account Info",
"description": "The contents of the keyfile as a dict."
}
},
"description": "Block used to manage authentication with GCP. Google authentication is handled via the `google.oauth2. See Google's [Authentication documentation](<https://cloud.google.com/docs/authentication#service-accounts>) for details on inference and recommended authentication patterns.",
"secret_fields": [
"service_account_info.*"
],
"block_type_slug": "gcp-credentials",
"block_schema_references": {}
}
]
},
"service_account_name": {
"type": "string",
"title": "Service Account Name",
"default": "name@project.iam.gserviceaccount.com",
"example": "service-account@example.iam.gserviceaccount.com",
"description": "The name of the service account to use for the task execution of Cloud Run Job. By default Cloud Run jobs run as the default Compute Engine Service Account."
}
}
@Marvin please compare this with the deployment vars i gave and see if it works together or theres something missing. remember htis is a cloud run PUSH deployment.Marvin
09/05/2025, 3:58 AMMarvin
09/05/2025, 3:59 AM<region>-docker.pkg.dev/<project>/<repo>/vsp-flows:<tag>
- Avoid :latest
if you can (pin a tag).
- credentials: Using a GcpCredentials
block via $ref
is correct. Ensure the block has either:
- project
set on the block, or
- that Prefect/GCP can infer project from service_account_info.project_id
.
Setting project
explicitly in the block is safest.
What your work pool template needs to provide
- region: Must match the region where you want to create the Cloud Run Job. The schema shows a default us-west1
— confirm that’s actually your region.
- service_account_name: The default in your schema is just a placeholder. Make sure your pool has this set to a real SA email, e.g. <mailto:my-sa@my-project.iam.gserviceaccount.com|my-sa@my-project.iam.gserviceaccount.com>
.
- image (in pool): Acts as a default. Your deployment’s image
will override it, which is what you want.
Precedence summary
- Deployment job_variables override the work pool base job template.
- If you don’t set a field in the deployment, the worker uses the pool’s default for that field.
- In your case, image + credentials come from the deployment; region + service_account_name can come from the pool.
Common pitfalls to avoid
- Using a non–fully-qualified image (e.g., missing <region>-docker.pkg.dev/...
) when you mean Artifact Registry.
- Leaving service_account_name
as the default placeholder in the pool (it must be a valid SA email).
- Not specifying project anywhere (set it on the GcpCredentials block to be explicit).
- Assuming DEBUG in the job’s env helps here — it only affects logs inside the job container. Since your job isn’t being created, you need DEBUG on the worker process.
How to verify your current config quickly
- Inspect the pool to confirm the actual values the worker will use:
prefect work-pool inspect "<your-cloud-run-pool>"
- If you want to see the Cloud Run base job template shape for reference:
uv run --with 'prefect[gcp]' prefect work-pool get-default-base-job-template --type cloud-run
If you share:
- The fully qualified image name you’re using (redacted is fine),
- The output of prefect work-pool inspect <pool>
(redacted),
- And any DEBUG worker logs around the failure,
…I can confirm if anything else is missing or mismatched.
Docs:
- GCP worker guide (Cloud Run): https://docs-3.prefect.io/integrations/prefect-gcp/gcp-worker-guide
- Troubleshooting: https://docs-3.prefect.io/v3/how-to-guides/cloud/troubleshoot-cloud