Hi Prefect community, I am looking for help with D...
# ask-community
p
Hi Prefect community, I am looking for help with DaskTaskRunner in a Prefect flow which ends up with an error when there are more than 200 tasks submitted. The Prefect Server runs as a k8s service. The DaskTaskRunner starts an ephemeral Dask KubeCluster with adaptive worker management. After 8-10 minutes of flow run and seventy tasks successfully processed, I get an error.
Copy code
21:35:02.138 | INFO    | Flow run 'Re-Dump 2024-01-01 - 2024-01-16T12:00' - Submitted task run 'requestlog_writer-4673' for execution.
21:35:02.147 | INFO    | Flow run 'Re-Dump 2024-01-01 - 2024-01-16T12:00' - Created task run 'requestlog_writer-3730' for task 'requestlog_writer'
21:35:02.153 | INFO    | Flow run 'Re-Dump 2024-01-01 - 2024-01-16T12:00' - Submitted task run 'requestlog_writer-3730' for execution.
21:40:53.671 | INFO    | distributed.deploy.adaptive_core - Adaptive stop
21:40:53.673 | ERROR   | Flow run 'Re-Dump 2024-01-01 - 2024-01-16T12:00' - Crash detected! Request to <http://prefect-server.dev-namespace:4200/api/task_runs/> failed: PoolTimeout: .
  File "/opt/venv/lib/python3.10/site-packages/httpcore/_synchronization.py", line 123, in wait
    await self._anyio_event.wait()
  File "/opt/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 1778, in wait
    if await self._event.wait():
  File "/usr/local/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError
Does anyone have experience with similar behavior?