Seems like some of our long running jobs simply crash with e Prefect Community #ask-community

Seems like some of our long-running jobs simply cr...

Idan

10/25/2023, 9:33 AM

Seems like some of our long-running jobs simply crash with e.g.

Copy code

01:17:34.163 | ERROR   | prefect.agent - An error occured while monitoring flow run 'b7217f7f-6b8b-4cc4-8b85-b85195f9a78a'. The flow run will not be marked as failed, but an issue may have occurred.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/agent.py", line 490, in _submit_run_and_capture_errors
    result = await infrastructure.run(task_status=task_status)
  File "/usr/local/lib/python3.10/site-packages/prefect/infrastructure/kubernetes.py", line 308, in run
    status_code = await run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 91, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/prefect/infrastructure/kubernetes.py", line 672, in _watch_job
    for event in watch.stream(
  File "/usr/local/lib/python3.10/site-packages/kubernetes/watch/watch.py", line 182, in stream
    raise client.rest.ApiException(
kubernetes.client.exceptions.ApiException: (410)
Reason: Expired: too old resource version: 45972664 (46031599)

Any ideas on how to circumvent this? 🤔

4 Views

Open in Slack

Previous Next