I'm having difficulty with using push work pools t...
# ask-community
d
I'm having difficulty with using push work pools to EC2 instances with GPUs. I'm using prefect cloud and I want to be able to use a work pool that submits flows to EC2 instances with GPUs. I either run into issues where the flow can't be submitted to the infrastructure or the flow does not actually kick off. Is there a standard way to do this (or terraform template to set this up)?
In particular, I am able to start a worked in my cluster, but I am getting this error:
Copy code
ailed to submit flow run '37e40179-6e41-4415-afd4-8043318bad80' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 748, in _create_task_and_wait_for_start
    task = self._create_task_run(ecs_client, task_run_request)
  File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 330, in wrapped_f
    return self(f, *args, **kw)
  File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 467, in __call__
    do = self.iter(retry_state=retry_state)
  File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 368, in iter
    result = action(retry_state)
  File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 410, in exc_check
    raise retry_exc.reraise()
  File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 183, in reraise
    raise self.last_attempt.result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 470, in __call__
    result = fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 1633, in _create_task_run
    task = ecs_client.run_task(**task_run_request)
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 565, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 1021, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidParameterException: An error occurred (InvalidParameterException) when calling the RunTask operation: No Container Instances were found in your cluster.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/workers/base.py", line 908, in _submit_run_and_capture_errors
    result = await self.run(
  File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 640, in run
    ) = await run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 136, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 752, in _create_task_and_wait_for_start
    self._report_task_run_creation_failure(configuration, task_run_request, exc)
  File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 907, in _report_task_run_creation_failure
    raise RuntimeError(
RuntimeError: Failed to run ECS task, cluster 'ecs-gpu-cluster' does not appear to have any container instances associated with it. Confirm that you have EC2 container instances available.
t
@Dev Dabke did you ever make headway on this? I am having the exact same issue.