Scott Pedersen
09/29/2025, 12:03 AMkubernetes.client.exceptions.ApiException: (410) Reason: Expired: too old resource version: 6563880465 (6565397753)
Marvin
09/29/2025, 12:03 AMMarvin
09/29/2025, 12:04 AMScott Pedersen
09/29/2025, 12:07 AMAn error occurred while monitoring flow run. The flow run will not be marked as failed, but an issue may have occurred.
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/prefect/workers/base.py", line 896, in _submit_run_and_capture_errors
result = await self.run(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 599, in run
status_code = await run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/prefect/utilities/asyncutils.py", line 95, in run_sync_in_worker_thread
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 1028, in _watch_job
for event in self._job_events(
File "/usr/local/lib/python3.11/site-packages/kubernetes/watch/watch.py", line 195, in stream
raise client.rest.ApiException(
kubernetes.client.exceptions.ApiException: (410)
Reason: Expired: too old resource version: 6563880465 (6565397753)
Marvin
09/29/2025, 12:07 AMMarvin
09/29/2025, 12:07 AMresourceVersion
, but the API server has already compacted past that version (etcd compaction / high update churn), so it returns 410 Gone: “Expired: too old resource version”.
- This is a normal Kubernetes watch behavior and not a failure of your flow/pod.
What it means for your run:
- The flow/pod keeps running; this only affects the worker’s ability to monitor via the current watch stream.
- Prefect logs “An error occurred while monitoring flow run… not marked as failed”. Monitoring/log streaming/cancellation visibility may be briefly affected until the client re-lists and re-establishes the watch.
Why it happens more often:
- High churn in the namespace (many Pod/Job updates/deletes)
- Long-lived watches or brief network interruptions causing the client to fall behind
- API server/etcd compaction advancing the watch cache beyond the stored resourceVersion
Mitigations you can try:
- Narrow the worker’s watch scope with specific label selectors in your work pool’s job template so it only watches the Jobs it creates.
- Reduce churn in the namespace (separate busy system components into different namespaces or tune controllers that generate lots of events).
- Ensure stable connectivity from the worker to the API server (avoid network interruptions, long GC pauses, etc.).
- Upgrade to the latest prefect
and `prefect-kubernetes`—there have been improvements in watch handling over time.
References:
- Kubernetes: Efficient detection of changes
- Kubernetes: Watching resources
- Related Prefect discussion (2.x but same underlying cause): Prefect issue: 410 Expired watch behavior
If you’re seeing this frequently, share your Prefect and prefect-kubernetes
versions and a rough sense of namespace churn—we can suggest more targeted tweaks.Scott Pedersen
09/29/2025, 1:09 AMMarvin
09/29/2025, 1:09 AMMarvin
09/29/2025, 1:11 AMprefect_kubernetes.worker._watch_job
while iterating self._job_events
, which uses a Watch().stream(...)
against the Job resource. The watch is resuming with an old resourceVersion
for that Job list, so the API server returns 410 Gone.
Notes:
- Prefect may also interact with Pods for log streaming, but this particular 410 is from the Job watch.
- Job docs: Kubernetes Job