I'm in a ECS worker + workpool setup, and all my r...
# prefect-aws
r
I'm in a ECS worker + workpool setup, and all my runs started crashing with the following error :
Copy code
Failed to submit flow run '5035828c-12ea-48c9-a73a-9f4e8441c8b6' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 1557, in _create_task_run
    return ecs_client.run_task(**task_run_request)["tasks"][0]
IndexError: list index out of range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/workers/base.py", line 896, in _submit_run_and_capture_errors
    result = await self.run(
  File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 598, in run
    ) = await run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 91, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 761, in _create_task_and_wait_for_start
    self._report_task_run_creation_failure(configuration, task_run_request, exc)
  File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 757, in _create_task_and_wait_for_start
    task = self._create_task_run(ecs_client, task_run_request)
  File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0xffff50f7d3c0 state=finished raised IndexError>]
Does anyone encounter a similar situation? The error message is not very explicit.
EDIT : not all tasks are crashing with this error, only 9/10. Is it possible that tasks cannot be submitted because the resources for the ECS tasks are not available in my AWS region at the time of the request? I noticed this is more frequent for large ECS tasks than small ones.
EDIT 2 : it seems that the worker service talking to my work pool died at some point. I'm trying to understand why.
EDIT 3 : to other poor souls finding this, here is the related issue
šŸ‘ 1
e
@Romain Vincent thanks for the Github link, it was very helpful and helped me solve the issue. In my case I was having Fargate exceeding the vCPU limit