<@ULVA73B9P>, any reason why my prefect server cau...
# ask-marvin
w
@Marvin, any reason why my prefect server causes all of my runs across all my deployments to systematically crash with the error
Flow run could not be submitted to infrastructure: The service has encountered an internal error. Please try again later
m
thought for 261.4 seconds
Sorry you’re hitting that. That message means the API service returned a 500 during the handoff to infrastructure, so it’s typically a server-side issue (DB/services/version) rather than a single deployment misconfig. Common root causes in Prefect 3.x - Server/worker version mismatch after an upgrade - If your server and workers are on different minor versions, API contracts can diverge and cause 500s during submission. - Database migrations not applied (or DB issues) - If the server was upgraded but the DB schema wasn’t, submissions can 500. SQLite “database is locked” is also a frequent culprit under concurrency. - Server services not running or crashed - Starting the API with services disabled or a services crash can break submission paths. - Work pool / worker plugin mismatch - E.g., using a Docker/Kubernetes work pool but the worker environment doesn’t have the matching extra installed; worker logs will show the real error but the UI often shows the generic “internal error”. Quick triage checklist 1) Check server health - Hit the health endpoint and confirm status is ok:
Copy code
curl http://<your-server-host>:<port>/api/health
Docs: Server health check 2) Inspect server logs around the failure time - If running locally, start with more verbosity:
Copy code
prefect server start --log-level DEBUG
- If running via Docker/Helm, check the API container/pod logs for stack traces, “database is locked”, “no such column”, or IntegrityError. 3) Ensure the database is migrated - Run:
Copy code
prefect server database upgrade -y
- If you’re on SQLite and see “database is locked”, reduce concurrent load or move to Postgres for reliability. 4) Verify version parity (server vs. workers and deployments) - On each machine (server and worker hosts), run:
Copy code
prefect version
- Keep versions aligned (ideally same minor/patch). Release notes: Prefect 3.x release notes 5) Confirm server services are enabled and running - List services:
Copy code
prefect server services list-services
- If needed, start them in-process (for debugging/local):
Copy code
prefect server services start-services
6) Check work pools and workers - Verify work pools exist and types are correct:
Copy code
prefect work-pool ls -v
- Ensure workers are running and polling the right pool:
Copy code
prefect worker start -p <your-pool-name>
- If you’re using Docker/Kubernetes workers, ensure the worker environment has the required extras installed (e.g.,
pip install "prefect[docker]"
or
prefect[kubernetes]
). Useful docs - Server CLI: Self-hosted server CLI - Health check API: Health check If you can share a bit more, I can be more specific: - How are you running the server (local
prefect server start
, Docker, or Helm)? Which database (SQLite vs Postgres)? - Prefect versions on server and workers (
prefect version
output) - Work pool type(s) and how your workers are started - A few lines of server API logs (around the time a run fails) and any worker logs for a failing run Those details usually pinpoint whether it’s a DB migration/lock, a services issue, or a version/plugin mismatch.
w
it was working without any change to the docker container or the dependencies versions, and from 3 AM CET it stopped working without any deployment changes