<@ULVA73B9P> I get this error in Prefect 3: `Flow ...
# ask-marvin
n
@Marvin I get this error in Prefect 3:
Flow run infrastructure exited with non-zero status code -1.
What does it mean?
m
thought for 90.4 seconds
Short answer: it means the environment that was running your flow (process/container/pod) terminated abnormally and didn’t report a normal exit code back to Prefect. Prefect marks the run as Crashed with “infrastructure exited with non-zero status code -1”. What “-1” usually indicates - It’s a placeholder for “we didn’t get a normal exit code.” In Unix, negative return codes generally mean “killed by a signal” (e.g., -9 = SIGKILL, -15 = SIGTERM). Some infrastructures report an “unknown/abrupt termination” as -1, or Prefect couldn’t retrieve a real exit code before the runtime disappeared. - Common reasons: out-of-memory kills, preemption/scale-down (SIGTERM then SIGKILL), the container/pod crashing early, image/entrypoint misconfigurations, or the worker losing its watch on the job. How to debug (pick the part that matches your infrastructure) - General - Open the flow run logs to see if a Python exception or import error occurred before the crash. - Run the same code locally to confirm it starts up correctly with the same env vars and dependencies. - Set
PREFECT_LOGGING_LEVEL=DEBUG
on your worker to get more detail. - Kubernetes - Find the pod/job for the flow run and inspect it:
Copy code
kubectl get pods -n <namespace>
    kubectl describe pod <pod-name> -n <namespace>
    kubectl logs <pod-name> -n <namespace> --all-containers
- Look for events like OOMKilled, ImagePullBackOff, CrashLoopBackOff, or node preemption. - Check resource limits/requests; OOMs often show as SIGKILL/137. - Make sure the worker can reach the API server continuously (network/RBAC issues can cause the watch to drop). - Docker - Inspect the container’s exit code and logs:
Copy code
docker ps -a
    docker logs <container-id>
    docker inspect <container-id> --format='{{.State.ExitCode}}'
- Verify the image exists, the entrypoint/command is correct, and the container has enough memory/CPU. - Ensure the container can reach your Prefect API (Cloud or server) and any required external services. - Subprocess/local - Check the worker logs; a negative return code means the process was killed by a signal (e.g., SIGTERM/SIGKILL). - Look for earlier exceptions or environment/import issues that terminated the process early. Helpful references - Flow and task states - Docker infrastructure guide - Related GitHub threads showing this message tied to infrastructure termination/watch issues: - Handle failed Kubernetes scheduling events more gracefully - Incorrect enforcement of job watch timeout in Kubernetes If you can share: - Which worker/infrastructure you’re using (Subprocess, Docker, Kubernetes, ECS, etc.) - A snippet of the flow run logs around the crash - Any pod/container exit code/events you see …I can help pinpoint the root cause and suggest a targeted fix.