Hello I have a really weird error. When I run 3-5 ...
# prefect-community
a
Hello I have a really weird error. When I run 3-5 flows in the same moment, there isn’t any problem. But when I try to run over 50 flows, I get the error :
concurrent.futures._base.CancelledError
I see that my cluster hasn’t any problem to create over 50 pods simultaneously. Queue concurrency limit is 500. Stack trace in thread. PS: Does it rely that during transition phase to Prefect2 I used free cloud account? (if yes where can I can see limits)
Copy code
Encountered exception during execution:
Traceback (most recent call last):
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/prefect/engine.py", line 637, in orchestrate_flow_run
    result = await run_sync(flow_call)
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 91, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/app/flows_2/main_flow.py", line 129, in main_flow
    features_execution_flow_results = add_task_runner_to_features_execution_flow(
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/prefect/flows.py", line 448, in __call__
    return enter_flow_run_engine_from_flow_call(
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/prefect/engine.py", line 168, in enter_flow_run_engine_from_flow_call
    return run_async_from_worker_thread(begin_run)
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 177, in run_async_from_worker_thread
    return anyio.from_thread.run(call)
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/anyio/from_thread.py", line 49, in run
    return asynclib.run_async_from_thread(func, *args)
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 970, in run_async_from_thread
    return f.result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 442, in result
    raise CancelledError()
concurrent.futures._base.CancelledError
Copy code
Crash detected! Execution was interrupted by an unexpected exception: Traceback (most recent call last):
  File "/usr/lib/python3.8/contextlib.py", line 189, in __aexit__
    await self.gen.athrow(typ, value, traceback)
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/prefect/task_runners.py", line 163, in start
    yield self
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/prefect/engine.py", line 372, in begin_flow_run
    terminal_or_paused_state = await orchestrate_flow_run(
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/prefect/engine.py", line 637, in orchestrate_flow_run
    result = await run_sync(flow_call)
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 91, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/opt/pysetup/.venv/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

concurrent.futures._base.CancelledError: objects_blacklist-03eab60f-0-ccdd4886963f49fda76ea528fb1fcd96-1
w
Just out of curiosity, where is it running? Gke, aws, azure? This log is from the parent flow creating 50 flows?
a
It’s running on Gke, log from parent flow which runs 2 subflows. So 50 flows run = 50*3 150 flows are executing in the same moment
One of subflows is running on Dask cluster
@Christopher Boyd, @Ryan Peden maybe you have an idea what I should see?
w
Are you running on autopilot or standard cluster?
I'm just asking because Crashed usually means it's the infrastructure (or failed network/Orion calls). If you are running on GKE Autopilot it could be because pods are getting evicted to scale down the nodes (a few other people have had this issue). It might even happen with a regular GKE cluster but I'm not sure.
a
@Walter Cavinaw thanks for your questions. We have a regular gke cluster. The problem here, it seems to me from dask: https://github.com/dask/distributed/issues/4612 But I observe it only if i try to run more than 10 flows