<@ULVA73B9P> Reported flow run '*' as crashed: Flo...
# ask-marvin
j
@Marvin Reported flow run '*' as crashed: Flow run could not be submitted to infrastructure: APIError(HTTPError('500 Server Error: Internal Server Error for url: etc... . We recently updated to a new verison of prefect. Is this a known error?
m
thought for 422.3 seconds
Short answer: this isn’t a single “known bug” tied to the latest Prefect 3.x release, but a generic failure class that can surface after upgrades for a few different reasons. The 500 APIError means something returned a server-side error during submission; we need a bit more context to pinpoint it. A few common causes we’ve seen: - Version mismatches after upgrade (worker/client vs. API/Server), leading to 500s fetching blocks/deployments during submission - Infrastructure submission failures from providers (Kubernetes/ECS/Docker, etc.) that bubble up as “could not be submitted to infrastructure” - Examples: - Kubernetes job name conflict (shows the same error prefix) - Docker Hub pull rate limiting - ECS task definition registration rate limits - Self-hosted server migrations out of sync after upgrade, causing 500s on API calls used during submission Could you share: 1) Are you on Prefect Cloud or self-hosted Server? 2) Exact versions before and after the upgrade (Prefect on workers/clients; if self-hosted, Server too) 3) Worker type (Kubernetes, Docker, ECS, Process) and its logs around the failure 4) The full error, especially the exact URL in the 500 (that tells us which API endpoint failed) 5) Are all flow runs failing or just some deployments? Quick triage steps: - Confirm versions on a worker
Copy code
prefect version
- If self-hosted, ensure the DB is migrated and check server logs
Copy code
prefect server database upgrade
Then tail your API/server logs for the 500 stack trace. - Turn on debug logs for the worker and retry:
Copy code
PREFECT_LOGGING_ROOT_LEVEL=DEBUG prefect worker start -p <your-pool> -q <your-queue>
- Verify your worker can reach the correct API (
PREFECT_API_URL
), and that any integration packages match your worker type (e.g.,
prefect-kubernetes
,
prefect-aws
) at compatible versions. - If you’re on Cloud, check Prefect status for any ongoing incidents. References: - Troubleshooting guide: Troubleshoot Prefect Cloud - Server concepts (self-hosted): Prefect Server If you can paste the full traceback (with the 500 URL) and confirm your environment (Cloud vs Server, versions, worker type), I can zero in on the root cause and next steps.
j
only some runs are failing not all of them