Hi, I' using Prefect 2.20, with a kubernetes work ...
# ask-community
a
Hi, I' using Prefect 2.20, with a kubernetes work pool with kubernetes workers. I'm seeing the following error increasingly commonly
Copy code
An error occurred while monitoring flow run '1459b4b7-e8b6-434e-99e5-6af5af2de9a2'. The flow run will not be marked as failed, but an issue may have occurred.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/workers/base.py", line 908, in _submit_run_and_capture_errors
    result = await self.run(
  File "/usr/local/lib/python3.10/site-packages/prefect_kubernetes/worker.py", line 634, in run
    status_code = await self._watch_job(
  File "/usr/local/lib/python3.10/site-packages/prefect_kubernetes/worker.py", line 1063, in _watch_job
    pod = await self._get_job_pod(logger, job_name, configuration, client)
  File "/usr/local/lib/python3.10/site-packages/prefect_kubernetes/worker.py", line 1175, in _get_job_pod
    async for event in watch.stream(
  File "/usr/local/lib/python3.10/site-packages/kubernetes_asyncio/watch/watch.py", line 135, in __anext__
    return await self.next()
  File "/usr/local/lib/python3.10/site-packages/kubernetes_asyncio/watch/watch.py", line 162, in next
    line = await self.resp.content.readline()
  File "/usr/local/lib/python3.10/site-packages/aiohttp/streams.py", line 317, in readline
    return await self.readuntil()
  File "/usr/local/lib/python3.10/site-packages/aiohttp/streams.py", line 351, in readuntil
    await self._wait("readuntil")
  File "/usr/local/lib/python3.10/site-packages/aiohttp/streams.py", line 311, in _wait
    with self._timer:
  File "/usr/local/lib/python3.10/site-packages/aiohttp/helpers.py", line 713, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError
5 minutes after the job starts. It could be in pending, but I've set
PREFECT_TASK_SCHEDULING_PENDING_TASK_TIMEOUT
to 2 hours, so i'm don't think this should be happening. It doesn't cause the job to fail, but it does stop the job being cleaned up after completion. can anyone help?
Hi, I've gone back through the docs and this looks like it should be right, but it still isn't working. The setting is 30s on the server, so I'm unsure where 5 mins is coming from
j
Hey Arthur, the setting
PREFECT_TASK_SCHEDULING_PENDING_TASK_TIMEOUT
is related to prefrct background tasks and is unrelated to the k8s work pool. Do you have pod_watch_timeout_seconds configured on your work pool? That’s what controls when the error youre seeing is raised: https://github.com/PrefectHQ/prefect/blob/2.x/src/integrations/prefect-kubernetes/prefect_kubernetes/worker.py#L1180
a
hi Jake, thanks for coming back to me. I do indeed, it's set to 24 hours!