https://prefect.io logo
r

redsquare

08/02/2023, 11:42 AM
Hi - I have a flow that runs in k8s and uses the SequentialTaskRunner to call about 3k tasks - 90% of the time the flow fails with the error following in the thread. Any ideas on where to start debugging this issue? 10% of the time the flow completes successfully.
Copy code
Encountered exception during execution:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1871, in _run_once
    event_list = self._selector.select(timeout)
  File "/usr/local/lib/python3.10/selectors.py", line 469, in select
    fd_event_list = self._selector.poll(timeout, max_ev)
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1670, in cancel_flow_run
    raise TerminationSignal(signal=signal.SIGTERM)
prefect.exceptions.TerminationSignal

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 665, in orchestrate_flow_run
    result = await run_sync(flow_call)
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 165, in run_sync_in_interruptible_worker_thread
    assert result is not NotSet
AssertionError
image.png
Just seems to be too random when it fails, no commonality with a certain problematic record or running time. For me it points to a prefect issue rather than anything in the flow which is not changing at all.
d

Deceivious

08/02/2023, 12:04 PM
I dont have big brains but Id start with the following I guess. 1. Are the tasks in a subflow? 2. What do the tasks return? I am thinking maybe memory issues? 3. Have you tried running in a "Process agent" on your local machine?
4. Whats the logs in the kubernetes agent pods?
r

redsquare

08/02/2023, 12:09 PM
Logs from the agent above, no subflows, pods do not OOM, tasks are marked with cache_result_in_memory=False
it runs fine locally when I use vscode to run the flow
t

Tim Galvin

08/02/2023, 12:21 PM
Where is the database? Have you set up a postgres database, or is prefect using an sqlite database?
r

redsquare

08/02/2023, 12:22 PM
we use prefect cloud
t

Tim Galvin

08/02/2023, 12:23 PM
Ahhh ok. My guess was going to be sqlite was not fit for that many tasks
r

redsquare

08/02/2023, 12:29 PM
going to upgrade the flow & agent to 2.11.2 and try again but this has been like this for weeks
no joy