< Marvin> how make if job pod have ImagePullBackOff flow mar Prefect Community #marvin-ai

Join Slack

<@ULVA73B9P> how make if job/pod have ImagePullBac...

# marvin-ai

Denys Y

09/29/2025, 10:47 AM

@Marvin how make if job/pod have ImagePullBackOff flow marked as crashed or failed and not keep Pending

Marvin

09/29/2025, 10:47 AM

✅ thought for 254.9 seconds

Marvin

09/29/2025, 10:52 AM

Got it — you want flow runs to fail/crash quickly when the Kubernetes pod hits ImagePullBackOff instead of lingering in Pending/Submitted. Key points: - ImagePullBackOff is a Kubernetes pod startup issue. Kubernetes won’t mark the Job failed by default, so the Prefect flow run can sit in Submitted/Pending waiting for the pod to start. - There isn’t a built-in “image pull timeout” in the Prefect 3.x Kubernetes worker. You’ll want to either: - Fail the Kubernetes Job via its spec (so Prefect marks the flow run failed/crashed), or - Use a Prefect Automation to fail runs that are stuck in Submitted/Pending too long. Recommended approaches 1) Add a Kubernetes Job deadline (fail fast at the Job level) - Set

activeDeadlineSeconds

on the Job so Kubernetes marks it Failed if the pod doesn’t start within N seconds. Prefect will then mark the flow run as failed/crashed. - Optionally, set a small

backoffLimit

as well. Example when deploying a flow (Prefect 3.x), passing job variables to the Kubernetes worker:

Copy code

from prefect import flow

@flow
def my_flow():
    ...

deployment = (
    my_flow
    .from_source(
        source="<https://github.com/owner/repo>",
        entrypoint="path/to/flow.py:my_flow",
    )
    .deploy(
        name="k8s-deploy",
        work_pool_name="my-k8s-pool",
        job_variables={
            "image": "myrepo/myimage:tag",
            # Fail job if it cannot start within 5 minutes
            "job": {
                "spec": {
                    "activeDeadlineSeconds": 300,
                    "backoffLimit": 0
                }
            },
            # Example: ensure your registry creds are set so you don’t hit ImagePullBackOff
            "pod": {
                "spec": {
                    "imagePullSecrets": [{"name": "my-regcred"}]
                }
            }
        },
    )
)

Notes: - The exact shape of

job_variables

depends on how you need to override the Job/Pod spec in your cluster. The goal is to surface

spec.activeDeadlineSeconds

(and optionally

backoffLimit

) on the Job Kubernetes creates. - Docs: Kubernetes deployments and Workers 2) Fail runs with a Prefect Automation if they’re stuck in Submitted/Pending - In Prefect Cloud, create an Automation that triggers when a flow run has been in Submitted (or Pending) for more than N minutes and sets the flow run state to Failed (or Cancels it). - This is a good safety net that doesn’t rely on Kubernetes marking the Job failed. - Docs: Automations Practical mitigations to reduce ImagePullBackOff - Ensure correct

imagePullSecrets

on the Pod spec (private registries). - Use

IfNotPresent

pull policy to avoid unnecessary pulls when the image is cached on nodes. - Pre-pull your base images on nodes (DaemonSet or node init strategy). Questions to make sure we land the config correctly: - Are you on Prefect 3.x with the Kubernetes worker? - Can you share how you’re deploying (Python deploy call vs CLI) and your current

job_variables

, if any? - Do you prefer failing at the Kubernetes Job level (via

activeDeadlineSeconds

) or via a Prefect Automation?

Open in Slack

Previous Next