Oliver Tedder
08/20/2025, 10:11 AMBacking off due to consecutive errors, using increased interval of 80.0s.
Failed the last 3 attempts. Please check your environment and configuration.
Examples of recent errors:
Traceback (most recent call last):
File "C:\Prefect\venv\Lib\site-packages\prefect\utilities\services.py", line 64, in critical_service_loop
await workload()
File "C:\Prefect\venv\Lib\site-packages\prefect\workers\base.py", line 983, in get_and_submit_flow_runs
runs_response = await self._get_scheduled_flow_runs()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Prefect\venv\Lib\site-packages\prefect\workers\base.py", line 1142, in _get_scheduled_flow_runs
await self.client.get_scheduled_flow_runs_for_work_pool(
File "C:\Prefect\venv\Lib\site-packages\prefect\client\orchestration\_work_pools\client.py", line 586, in get_scheduled_flow_runs_for_work_pool
response = await self.request(
^^^^^^^^^^^^^^^^^^^
File "C:\Prefect\venv\Lib\site-packages\prefect\client\orchestration\base.py", line 53, in request
return await self._client.send(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Prefect\venv\Lib\site-packages\prefect\client\base.py", line 361, in send
response.raise_for_status()
File "C:\Prefect\venv\Lib\site-packages\prefect\client\base.py", line 162, in raise_for_status
raise PrefectHTTPStatusError.from_httpx_error(exc) from exc.__cause__
prefect.exceptions.PrefectHTTPStatusError: Server error '500 Internal Server Error' for url '<http://1.2.3.4:4200/api/work_pools/pool-12/get_scheduled_flow_runs>'
Response: {'exception_message': 'Internal Server Error'}
For more information check: <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500>
An exception occurred.
Press any key to continue . . .
Marvin
08/20/2025, 10:11 AMMarvin
08/20/2025, 10:14 AMMarvin
08/20/2025, 3:01 PMMarvin
08/20/2025, 3:01 PMOliver Tedder
08/20/2025, 3:02 PMThe above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Prefect\Server\venv\Lib\site-packages\prefect\server\utilities\messaging\memory.py", line 357, in _consume_loop
await handler(message)
File "C:\Prefect\Server\venv\Lib\site-packages\prefect\server\services\task_run_recorder.py", line 213, in message_handler
await record_task_run_event(event)
File "C:\Prefect\Server\venv\Lib\site-packages\prefect\server\services\task_run_recorder.py", line 172, in record_task_run_event
await _insert_task_run(session, task_run, task_run_attributes)
File "C:\Prefect\Server\venv\Lib\site-packages\prefect\server\services\task_run_recorder.py", line 57, in _insert_task_run
await session.execute(
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\ext\asyncio\session.py", line 463, in execute
result = await greenlet_spawn(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\util\_concurrency_py3k.py", line 201, in greenlet_spawn
result = context.throw(*sys.exc_info())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\orm\session.py", line 2365, in execute
return self._execute_internal(
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\orm\session.py", line 2251, in _execute_internal
result: Result[Any] = compile_state_cls.orm_execute_statement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\orm\bulk_persistence.py", line 1294, in orm_execute_statement
result = conn.execute(
^^^^^^^^^^^^^
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1415, in execute
return meth(
^^^^^
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\sql\elements.py", line 523, in _execute_on_connection
return connection._execute_clauseelement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1637, in _execute_clauseelement
ret = self._execute_context(
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1842, in _execute_context
return self._exec_single_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1982, in _exec_single_context
self._handle_dbapi_exception(
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\engine\base.py", line 2351, in _handle_dbapi_exception
raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1963, in _exec_single_context
self.dialect.do_execute(
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\engine\default.py", line 943, in do_execute
cursor.execute(statement, parameters)
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\dialects\postgresql\asyncpg.py", line 580, in execute
self._adapt_connection.await_(
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\util\_concurrency_py3k.py", line 132, in await_only
return current.parent.switch(awaitable) # type: ignore[no-any-return,attr-defined] # noqa: E501
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\util\_concurrency_py3k.py", line 196, in greenlet_spawn
value = await result
^^^^^^^^^^^^
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\dialects\postgresql\asyncpg.py", line 558, in _prepare_and_execute
self._handle_exception(error)
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\dialects\postgresql\asyncpg.py", line 508, in _handle_exception
self._adapt_connection._handle_exception(error)
File "C:\Prefect\Server\venv\Lib\site-packages\sqlalchemy\dialects\postgresql\asyncpg.py", line 792, in _handle_exception
raise translated_error from error
sqlalchemy.exc.DBAPIError: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.DeadlockDetectedError'>: deadlock detected
DETAIL: Process 764039 waits for ShareLock on transaction 245134068; blocked by process 764041.
Process 764041 waits for ShareLock on speculative token 2816 of transaction 245134069; blocked by process 764039.
HINT: See server log for query details.
Marvin
08/20/2025, 3:02 PMMarvin
08/20/2025, 3:07 PMtask_run
from the TaskRunRecorder service:
- INSERT ... ON CONFLICT (id) DO UPDATE ... WHERE task_run.state_timestamp < ...
- Two transactions are waiting on each other (ShareLock + speculative insert), classic concurrent upsert contention.
- The “CancellationCleanup took longer than its loop interval” warnings are a symptom of server services getting slowed by DB contention.
Known reports and references
- Deadlocks reported after upgrading to 3.3.x: prefect#17767
- Another related performance/locking thread: prefect#17895
- Recent release note touching DB lock behavior (not the same code path, but still worth upgrading): 3.4.13 (see “Fix DB locks during DB clear” PR #18684)
Quick triage questions
- Which Prefect version are you running?
- Postgres version?
- Single API/server instance or multiple replicas?
- Any custom server DB settings (pool size, timeouts) in your profile?
Mitigations to try
1) Upgrade Prefect to the latest 3.4.x
- Several DB and service loop tweaks have landed since 3.3.x. It won’t hurt, and it might help.
2) Keep a single API server instance
- Prefect Server isn’t HA; running multiple API pods/processes can amplify contention on upserts.
3) Right-size SQLAlchemy connection pool (too high can make deadlocks worse)
- Start conservative and adjust:
PREFECT_SERVER_DATABASE_SQLALCHEMY__POOL_SIZE=20
PREFECT_SERVER_DATABASE_SQLALCHEMY__MAX_OVERFLOW=10
PREFECT_SERVER_DATABASE_SQLALCHEMY__POOL_TIMEOUT=30
- If you’ve cranked pool_size up (e.g., 60+), dial it back and observe.
4) Ensure default isolation level (READ COMMITTED)
- Avoid SERIALIZABLE/REPEATABLE READ on the connection or DB. If you’ve customized isolation, undo it.
5) Postgres hygiene
- Make sure autovacuum is healthy; vacuum/analyze tables (especially task_run
) to reduce bloat and lock times.
- Monitor locking to identify the culprits:
SELECT bl.pid AS blocked_pid, ka.query AS blocking_query, a.query AS blocked_query,
now() - a.query_start AS blocked_duration
FROM pg_locks bl
JOIN pg_stat_activity a ON a.pid = bl.pid
JOIN pg_locks kl ON kl.transactionid = bl.transactionid AND kl.pid != bl.pid
JOIN pg_stat_activity ka ON ka.pid = kl.pid
WHERE NOT bl.granted;
6) Reduce service churn while debugging
- Increase loop intervals temporarily to lower pressure:
PREFECT_SERVER_SERVICES__CANCELLATION_CLEANUP__LOOP_SECONDS=60
PREFECT_SERVER_SERVICES__RECENT_DEPLOYMENTS_SCHEDULER__LOOP_SECONDS=10
- These warnings are not fatal but indicate the server is lagging.
If the deadlocks persist after upgrading and tuning, we should capture more detail (enable SQLAlchemy echo and Prefect server DEBUG logs) and attach them to the existing issue for visibility.
Want me to suggest a minimal set of environment variables for your setup based on your current profile?