Arthur
09/11/2024, 9:07 AMAn error occurred while monitoring flow run '1459b4b7-e8b6-434e-99e5-6af5af2de9a2'. The flow run will not be marked as failed, but an issue may have occurred.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/prefect/workers/base.py", line 908, in _submit_run_and_capture_errors
result = await self.run(
File "/usr/local/lib/python3.10/site-packages/prefect_kubernetes/worker.py", line 634, in run
status_code = await self._watch_job(
File "/usr/local/lib/python3.10/site-packages/prefect_kubernetes/worker.py", line 1063, in _watch_job
pod = await self._get_job_pod(logger, job_name, configuration, client)
File "/usr/local/lib/python3.10/site-packages/prefect_kubernetes/worker.py", line 1175, in _get_job_pod
async for event in watch.stream(
File "/usr/local/lib/python3.10/site-packages/kubernetes_asyncio/watch/watch.py", line 135, in __anext__
return await self.next()
File "/usr/local/lib/python3.10/site-packages/kubernetes_asyncio/watch/watch.py", line 162, in next
line = await self.resp.content.readline()
File "/usr/local/lib/python3.10/site-packages/aiohttp/streams.py", line 317, in readline
return await self.readuntil()
File "/usr/local/lib/python3.10/site-packages/aiohttp/streams.py", line 351, in readuntil
await self._wait("readuntil")
File "/usr/local/lib/python3.10/site-packages/aiohttp/streams.py", line 311, in _wait
with self._timer:
File "/usr/local/lib/python3.10/site-packages/aiohttp/helpers.py", line 713, in __exit__
raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError
5 minutes after the job starts. It could be in pending, but I've set PREFECT_TASK_SCHEDULING_PENDING_TASK_TIMEOUT
to 2 hours, so i'm don't think this should be happening. It doesn't cause the job to fail, but it does stop the job being cleaned up after completion. can anyone help?Arthur
09/16/2024, 10:00 AMJake Kaplan
09/16/2024, 11:42 AMPREFECT_TASK_SCHEDULING_PENDING_TASK_TIMEOUT
is related to prefrct background tasks and is unrelated to the k8s work pool. Do you have pod_watch_timeout_seconds configured on your work pool? That’s what controls when the error youre seeing is raised: https://github.com/PrefectHQ/prefect/blob/2.x/src/integrations/prefect-kubernetes/prefect_kubernetes/worker.py#L1180Arthur
09/16/2024, 1:24 PM