Nicholas Pini
10/10/2025, 7:33 AMFlow run infrastructure exited with non-zero status code -1.
What does it mean?Marvin
10/10/2025, 7:33 AMMarvin
10/10/2025, 7:35 AMPREFECT_LOGGING_LEVEL=DEBUG
on your worker to get more detail.
- Kubernetes
- Find the pod/job for the flow run and inspect it:
kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --all-containers
- Look for events like OOMKilled, ImagePullBackOff, CrashLoopBackOff, or node preemption.
- Check resource limits/requests; OOMs often show as SIGKILL/137.
- Make sure the worker can reach the API server continuously (network/RBAC issues can cause the watch to drop).
- Docker
- Inspect the container’s exit code and logs:
docker ps -a
docker logs <container-id>
docker inspect <container-id> --format='{{.State.ExitCode}}'
- Verify the image exists, the entrypoint/command is correct, and the container has enough memory/CPU.
- Ensure the container can reach your Prefect API (Cloud or server) and any required external services.
- Subprocess/local
- Check the worker logs; a negative return code means the process was killed by a signal (e.g., SIGTERM/SIGKILL).
- Look for earlier exceptions or environment/import issues that terminated the process early.
Helpful references
- Flow and task states
- Docker infrastructure guide
- Related GitHub threads showing this message tied to infrastructure termination/watch issues:
- Handle failed Kubernetes scheduling events more gracefully
- Incorrect enforcement of job watch timeout in Kubernetes
If you can share:
- Which worker/infrastructure you’re using (Subprocess, Docker, Kubernetes, ECS, etc.)
- A snippet of the flow run logs around the crash
- Any pod/container exit code/events you see
…I can help pinpoint the root cause and suggest a targeted fix.