<@ULVA73B9P> i’m trying to do a cloud run push flo...
# ask-marvin
r
@Marvin i’m trying to do a cloud run push flow, and i keep getting:
Copy code
Flow run infrastructure exited with non-zero status code:
 Task ludicrous-zebu-m6wlm-task0 failed with message: The container exited with an error. (Error Code: 1)
no cloud run job is being created so i cant see any more info about whats actually wrong. how do i debug this
m
thought for 531.4 seconds
Thanks for the details — that error usually means the failure happened before a Cloud Run Job could even be created (during packaging, image build/push, or the API call to create the Job). The best clues live in your worker’s logs, not the flow run logs. Try this debugging checklist: 1) Watch the worker logs in DEBUG - Restart the worker in the foreground with DEBUG logging so you can see packaging/build/push and Cloud Run API calls:
Copy code
export PREFECT_LOGGING_LEVEL=DEBUG
# ensure the GCP extras are available to the worker
uv run --with 'prefect[gcp]' prefect worker start -p <your-cloud-run-pool>
- Re-trigger the flow. You should see steps like “building image…”, “pushing image…”, or an error from the Cloud Run Jobs API. 2) Verify Docker and registry auth on the worker host - Confirm Docker is available:
Copy code
docker version
docker info
- Authenticate Docker to Artifact Registry (pick one): - With gcloud:
Copy code
gcloud auth configure-docker <region>-docker.pkg.dev
- With a JSON key:
Copy code
cat key.json | docker login -u _json_key --password-stdin https://<region>-docker.pkg.dev
3) Validate the work pool config - Make sure the pool type is Cloud Run and your base job template has correct values for region, project, image URL, service account, etc. You can inspect the pool:
Copy code
prefect work-pool inspect "<your-cloud-run-pool>"
- If you want the default Cloud Run template for comparison:
Copy code
uv run --with 'prefect[gcp]' prefect work-pool get-default-base-job-template --type cloud-run
4) Check IAM for the service account used by the worker/job At minimum, ensure the SA creating the job and pushing images has: - roles/artifactregistry.writer on the target repo - roles/run.admin - roles/iam.serviceAccountUser (on the runtime SA specified in the job) If logs routing is involved, ensure logging permissions as needed. 5) Isolate outside Prefect to pinpoint the failure - Try pushing a test image to your Artifact Registry from the same machine:
Copy code
docker build -t <region>-docker.pkg.dev/<project>/<repo>/prefect-test:tmp - <<'EOF'
FROM <http://gcr.io/distroless/python3|gcr.io/distroless/python3>
CMD ["python","-c","print('ok')"]
EOF
docker push <region>-docker.pkg.dev/<project>/<repo>/prefect-test:tmp
- Try creating/executing a Cloud Run Job manually:
Copy code
gcloud run jobs create prefect-test --image <region>-docker.pkg.dev/<project>/<repo>/prefect-test:tmp --region <region> --project <project>
gcloud run jobs execute prefect-test --region <region> --project <project> --wait
If either of these fails, it’s a GCP/Docker/permissions issue rather than Prefect. 6) Make sure you’re looking at the right worker If you have multiple workers, confirm the one you’re tailing picked up the run (give it a unique name and restrict it to the exact work queue). Notes: -
@flow(log_prints=True)
won’t help here because your flow never starts — you need worker logs. - If you’re running the worker in Docker/K8s/systemd, use
docker logs ...
/
kubectl logs ...
/
journalctl -u ...
to view logs. Helpful docs: - GCP worker guide (Cloud Run) - Troubleshooting deployments/runs - Logging customization If you can share: - How you started the worker (command and where it’s running) - Your pool type and a redacted base job template (image URL, region, project) - Any worker logs around the failure …I can help pinpoint the failing step quickly. Are you on Prefect 3.x, and is your worker running on a machine with Docker installed and GCP auth configured?
r
@Marvin does the workpool image get overriden by the deployment image? or is the work pool image for managing the flow execution, and the deployment image is for the flow code location? remember i’m using google cloud run with a flow code containing image in artifact registry taht is prebuilt.
m
thought for 123.8 seconds
Great question. Short answer: - The “image” in your Cloud Run work pool’s base job template is just a default runtime image for flow runs. - A deployment can override that image via job variables. If both specify an image, the deployment’s image wins. - There is no separate “manager” image for the work pool. The worker is a Python process that submits Cloud Run Jobs; the Cloud Run Job runs the image you specify (from the pool template, unless overridden by the deployment). In your case (prebuilt flow-code image in Artifact Registry): - Point Prefect to your prebuilt image either in the work pool template or, better, in the deployment’s job variables. - Ensure your image includes Prefect and your flow’s dependencies, and that Cloud Run can override the command/args (so Prefect can start the flow runner inside the container). Example prefect.yaml snippet:
Copy code
deployments:
- name: prod
  entrypoint: flows/my_flow.py:my_flow
  work_pool:
    name: my-cloud-run-pool
  job_variables:
    image: us-central1-docker.pkg.dev/<project>/<repo>/<image>:<tag>
    region: us-central1
    project: <project>
    service_account: my-sa@<project>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>
Tip: If you’re using a prebuilt image, don’t include any build/push steps in your project; Prefect will just use the image you set. And if something fails before the Job is created, check the worker logs in DEBUG to see the exact step that’s failing. Docs: - GCP worker guide (Cloud Run)
r
how does debug mode work in this case? @Marvin my deployment does have job variables:
Copy code
{
  "image": "***/vsp-flows:latest",
  "credentials": {
    "$ref": {
      "block_document_id": "bca402ce-58e3-4be3-a60a-e774af172c1f"
    }
  }
}
where the block is gcp credentails service account info with value:
Copy code
{
  "type": "********",
  "auth_uri": "********",
  "client_id": "********",
  "token_uri": "********",
  "project_id": "********",
  "private_key": "********",
  "client_email": "********",
  "private_key_id": "********",
  "universe_domain": "********",
  "client_x509_cert_url": "********",
  "auth_provider_x509_cert_url": "********"
}
is taht right also
m
thought for 46.4 seconds
Great follow-up. How “debug mode” works here - The logs you need are your worker’s logs. Setting DEBUG on the worker will surface the packaging/build/push steps and the Cloud Run Jobs API calls that happen before any Job is created. - Enable it by setting
PREFECT_LOGGING_LEVEL=DEBUG
in the environment where the worker runs and start the worker in the foreground:
Copy code
export PREFECT_LOGGING_LEVEL=DEBUG
uv run --with 'prefect[gcp]' prefect worker start -p <your-cloud-run-pool>
- Then re-trigger the flow run and watch the worker output. If the failure is during image resolution, credentials usage, or Cloud Run job creation, you’ll see a more specific error there. - If your worker is running in a container or another process manager, add
PREFECT_LOGGING_LEVEL=DEBUG
to that container’s env and check its logs (
docker logs
,
kubectl logs
, etc.). About your job variables and credentials block - Yes, referencing a
GcpCredentials
block in
job_variables.credentials
is correct for the Cloud Run worker. That block should contain valid service account info (what you pasted looks structurally right). - However, for Cloud Run you typically also need to specify
project
and
region
(and often a
service_account
for the runtime) — either in the work pool’s base job template or in the deployment’s
job_variables
. If those aren’t present anywhere, the worker will fail before creating the job (matching the error you’re seeing). Sanity check (things to confirm) 1) Image is fully qualified to Artifact Registry (including region):
Copy code
<region>-docker.pkg.dev/<project>/<repo>/<image>:<tag>
2) Project/region are set (pool template OR deployment job variables). 3) Optional but common: set the runtime service account:
Copy code
service_account: <sa-name>@<project>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>
4) The service account in your GcpCredentials block has: - roles/run.admin - roles/iam.serviceAccountUser (on the runtime SA) - roles/artifactregistry.writer (for the repo you pull/push) - logging permissions as needed Example deployment snippet using a prebuilt image
Copy code
deployments:
- name: prod
  entrypoint: flows/my_flow.py:my_flow
  work_pool:
    name: my-cloud-run-pool
  job_variables:
    image: us-central1-docker.pkg.dev/my-proj/my-repo/vsp-flows:1.2.3
    project: my-proj
    region: us-central1
    service_account: <mailto:my-cloud-run-sa@my-proj.iam.gserviceaccount.com|my-cloud-run-sa@my-proj.iam.gserviceaccount.com>
    credentials:
      $ref:
        block_document_id: bca402ce-58e3-4be3-a60a-e774af172c1f
Note: Prefer a pinned tag over
:latest
to avoid ambiguity. Useful CLI checks - Inspect your pool to see what’s already set in the base job template:
Copy code
prefect work-pool inspect "<your-cloud-run-pool>"
- Dump the default Cloud Run base job template to see required/optional fields:
Copy code
uv run --with 'prefect[gcp]' prefect work-pool get-default-base-job-template --type cloud-run
If you can share: - Your pool’s
inspect
output (redacted) - Your worker startup method and any DEBUG logs around the failure - The fully qualified image name you’re using …I can spot what’s missing quickly. Docs: - GCP worker guide (Cloud Run): https://docs-3.prefect.io/integrations/prefect-gcp/gcp-worker-guide - Troubleshooting deployments/runs: https://docs-3.prefect.io/v3/how-to-guides/cloud/troubleshoot-cloud - Logging customization: https://docs-3.prefect.io/advanced/logging-customization
r
i have the service account and region specified in the workpool not the deployment i think, is that ok?:
Copy code
{
  "cpu": {
    "type": "string",
    "title": "CPU",
    "default": "1000m",
    "example": "1000m",
    "pattern": "^(\\d*000)m$",
    "description": "The amount of compute allocated to the Cloud Run Job. (1000m = 1 CPU)."
  },
  "env": {
    "type": "object",
    "title": "Environment Variables",
    "default": {
      "DEPLOYMENT_TYPE": "cloud",
      "PYTHONUNBUFFERED": "1",
      "PREFECT_LOGGING_LEVEL": "DEBUG"
    },
    "description": "Environment variables to set when starting a flow run.",
    "additionalProperties": {
      "type": "string"
    }
  },
  "name": {
    "type": "string",
    "title": "Name",
    "description": "Name given to created infrastructure."
  },
  "image": {
    "type": "string",
    "title": "Image Name",
    "default": "docker.io/prefecthq/prefect:3-latest",
    "example": "docker.io/prefecthq/prefect:3-latest",
    "description": "The image to use for a new Cloud Run Job. If not set, the latest Prefect image will be used."
  },
  "labels": {
    "type": "object",
    "title": "Labels",
    "description": "Labels applied to created infrastructure.",
    "additionalProperties": {
      "type": "string"
    }
  },
  "memory": {
    "type": "string",
    "title": "Memory",
    "default": "1Gi",
    "example": "512Mi",
    "pattern": "^\\d+(?:G|Gi|M|Mi)$",
    "description": "The amount of memory allocated to the Cloud Run Job. Must be specified in units of 'G', 'Gi', 'M', or 'Mi'."
  },
  "region": {
    "type": "string",
    "title": "Region",
    "default": "us-west1",
    "example": "us-west1",
    "description": "The region where the Cloud Run Job resides."
  },
  "command": {
    "type": "string",
    "title": "Command",
    "description": "The command to use when starting a flow run. In most cases, this should be left blank and the command will be automatically generated."
  },
  "timeout": {
    "type": "integer",
    "title": "Job Timeout",
    "default": 600,
    "maximum": 3600,
    "description": "The length of time that Prefect will wait for Cloud Run Job state changes.",
    "exclusiveMinimum": 0
  },
  "keep_job": {
    "type": "boolean",
    "title": "Keep Job After Completion",
    "default": false,
    "description": "Keep the completed Cloud Run Job after it has run."
  },
  "credentials": {
    "title": "GCP Credentials",
    "description": "The GCP Credentials used to initiate the Cloud Run Job. `service_account_info` is required.",
    "anyOf": [
      {
        "type": "object",
        "title": "GcpCredentials",
        "properties": {
          "project": {
            "type": "string",
            "title": "Project",
            "description": "The GCP project to use for the client."
          },
          "service_account_file": {
            "type": "string",
            "title": "Service Account File",
            "format": "path",
            "description": "Path to the service account JSON keyfile."
          },
          "service_account_info": {
            "type": "object",
            "title": "Service Account Info",
            "description": "The contents of the keyfile as a dict."
          }
        },
        "description": "Block used to manage authentication with GCP. Google authentication is handled via the `google.oauth2. See Google's [Authentication documentation](<https://cloud.google.com/docs/authentication#service-accounts>) for details on inference and recommended authentication patterns.",
        "secret_fields": [
          "service_account_info.*"
        ],
        "block_type_slug": "gcp-credentials",
        "block_schema_references": {}
      }
    ]
  },
  "service_account_name": {
    "type": "string",
    "title": "Service Account Name",
    "default": "name@project.iam.gserviceaccount.com",
    "example": "service-account@example.iam.gserviceaccount.com",
    "description": "The name of the service account to use for the task execution of Cloud Run Job. By default Cloud Run jobs run as the default Compute Engine Service Account."
  }
}
@Marvin please compare this with the deployment vars i gave and see if it works together or theres something missing. remember htis is a cloud run PUSH deployment.
m
thought for 44.5 seconds
Short answer: yes — it’s fine to put region and service account in the work pool and just set the image + credentials in the deployment. Deployment job variables override the pool’s defaults; anything not provided by the deployment falls back to the pool template. A few important checks based on what you shared: What your deployment provides - image: Make sure it’s fully qualified for Artifact Registry: - Should look like:
<region>-docker.pkg.dev/<project>/<repo>/vsp-flows:<tag>
- Avoid
:latest
if you can (pin a tag). - credentials: Using a
GcpCredentials
block via
$ref
is correct. Ensure the block has either: -
project
set on the block, or - that Prefect/GCP can infer project from
service_account_info.project_id
. Setting
project
explicitly in the block is safest. What your work pool template needs to provide - region: Must match the region where you want to create the Cloud Run Job. The schema shows a default
us-west1
— confirm that’s actually your region. - service_account_name: The default in your schema is just a placeholder. Make sure your pool has this set to a real SA email, e.g.
<mailto:my-sa@my-project.iam.gserviceaccount.com|my-sa@my-project.iam.gserviceaccount.com>
. - image (in pool): Acts as a default. Your deployment’s
image
will override it, which is what you want. Precedence summary - Deployment job_variables override the work pool base job template. - If you don’t set a field in the deployment, the worker uses the pool’s default for that field. - In your case, image + credentials come from the deployment; region + service_account_name can come from the pool. Common pitfalls to avoid - Using a non–fully-qualified image (e.g., missing
<region>-docker.pkg.dev/...
) when you mean Artifact Registry. - Leaving
service_account_name
as the default placeholder in the pool (it must be a valid SA email). - Not specifying project anywhere (set it on the GcpCredentials block to be explicit). - Assuming DEBUG in the job’s env helps here — it only affects logs inside the job container. Since your job isn’t being created, you need DEBUG on the worker process. How to verify your current config quickly - Inspect the pool to confirm the actual values the worker will use:
Copy code
prefect work-pool inspect "<your-cloud-run-pool>"
- If you want to see the Cloud Run base job template shape for reference:
Copy code
uv run --with 'prefect[gcp]' prefect work-pool get-default-base-job-template --type cloud-run
If you share: - The fully qualified image name you’re using (redacted is fine), - The output of
prefect work-pool inspect <pool>
(redacted), - And any DEBUG worker logs around the failure, …I can confirm if anything else is missing or mismatched. Docs: - GCP worker guide (Cloud Run): https://docs-3.prefect.io/integrations/prefect-gcp/gcp-worker-guide - Troubleshooting: https://docs-3.prefect.io/v3/how-to-guides/cloud/troubleshoot-cloud