Joish
12/01/2023, 11:37 AMTraceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 2467, in <module>
enter_flow_run_engine_from_subprocess(flow_run_id)
File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 297, in enter_flow_run_engine_from_subprocess
state = from_sync.wait_for_call_in_loop_thread(
File "/usr/local/lib/python3.10/site-packages/prefect/_internal/concurrency/api.py", line 242, in wait_for_call_in_loop_thread
waiter.wait()
File "/usr/local/lib/python3.10/site-packages/prefect/_internal/concurrency/waiters.py", line 152, in wait
self._handle_waiting_callbacks()
File "/usr/local/lib/python3.10/site-packages/prefect/_internal/concurrency/waiters.py", line 126, in _handle_waiting_callbacks
callback.run()
File "/usr/local/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 273, in run
return self.context.run(asyncio.run, self._run_async(coro))
File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1871, in _run_once
event_list = self._selector.select(timeout)
File "/usr/local/lib/python3.10/selectors.py", line 469, in select
fd_event_list = self._selector.poll(timeout, max_ev)
File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1866, in cancel_flow_run
raise TerminationSignal(signal=signal.SIGTERM)
prefect.exceptions.TerminationSignal
rpc error: code = NotFound desc = an error occurred when try to find container "5742094a832ae44a72aef45eaccf8421c33577d21e0631ae96cdd58acbbe96bb": not found
Marvin
12/01/2023, 11:37 AMTerminationSignal
exception, which is raised when a flow run is terminated. This could happen for a variety of reasons, such as a task within the flow failing or an external process forcibly terminating the flow run.
Based on the issues I found on Prefect's GitHub repository, this error could be related to several issues:
1. Flow hooks on_crashed
/ on_cancellation
not executing correctly with KubernetesWorker (issue #10826). This issue is about the on_crashed
and on_cancellation
hooks not executing as expected when using a Kubernetes worker.
2. A bug where SIGTERM was not properly captured as a flow run crash (pull request #9498). This pull request fixed a bug where the SIGTERM signal wasn't being properly captured, which could lead to incorrect reports of cancellation.
3. No Prefect logger available by default in a vanilla Python function, even a callable on_cancellation hook (issue #10948). This issue is about the Prefect logger not being available in a Python function, which could cause issues when trying to log information during a cancellation hook.
4. Cancellation not consistent between CLI and UI (issue #9801). This issue is about inconsistencies between cancelling a flow run via the Prefect UI and cancelling it via the command line interface.
These issues might not directly cause the error you're seeing, but they could be related. I recommend checking your flow and tasks for any issues that could cause a termination signal to be sent, and making sure your cancellation hooks are set up correctly. If you're using Kubernetes, you might also want to check for any issues with your worker setup.Joish
12/01/2023, 11:40 AMMarvin
12/01/2023, 11:41 AMTerminationSignal
error you're seeing:
1. Flow hooks `on_crashed` / `on_cancellation` not executing correctly with KubernetesWorker: This issue is about the on_crashed
and on_cancellation
hooks not executing as expected when using a Kubernetes worker.
2. No Prefect logger available by default in a vanilla Python function, even a callable on_cancellation hook: This issue is about the Prefect logger not being available in a Python function, which could cause issues when trying to log information during a cancellation hook.
3. Fix bug where SIGTERM was not properly captured as a flow run crash: This pull request fixed a bug where the SIGTERM signal wasn't being properly captured, which could lead to incorrect reports of cancellation.
4. Add `on_cancellation` flow run state change hook: This pull request added the ability to execute arbitrary code when a flow enters a CANCELLING
state.
5. Cancellation not consistent between CLI and UI: This issue is about inconsistencies between cancelling a flow run via the Prefect UI and cancelling it via the command line interface.
These issues might not directly cause the error you're seeing, but they could be related. I recommend checking your flow and tasks for any issues that could cause a termination signal to be sent, and making sure your cancellation hooks are set up correctly. If you're using Kubernetes, you might also want to check for any issues with your worker setup.