Hi i m having issues running an ecs task from a worker pool Prefect Community #ask-community

Hi i’m having issues running an ecs task from a wo...

Oscar

09/27/2023, 2:12 PM

Hi i’m having issues running an ecs task from a worker pool in an ECS service.

Copy code

@flow
def my_flow():
    logger = get_run_logger()
    <http://logger.info|logger.info>("Hello from ECS!!")

My flow configuration looks like:

Copy code

###
### A complete description of a Prefect Deployment for flow 'my-flow'
###
name: my-flow
description: null
version: c11a68e0ce6739c76dc99de079edeb29
# The work queue that will handle this deployment's runs
work_queue_name: default
work_pool_name: ecs-pool
tags: []
parameters: {}
schedule: null
is_schedule_active: true
infra_overrides: {}

###
### DO NOT EDIT BELOW THIS LINE
###
flow_name: my-flow
manifest_path: null
infrastructure:
  type: ecs-task
  env: {}
  labels: {}
  name: null
  command: null
  aws_credentials:
    aws_access_key_id: null
    aws_secret_access_key: null
    aws_session_token: null
    profile_name: null
    region_name: null
    aws_client_parameters:
      api_version: null
      use_ssl: true
      verify: true
      verify_cert_path: null
      endpoint_url: null
      config: null
    block_type_slug: aws-credentials
  task_definition_arn: null
  task_definition: null
  family: null
  image: prefecthq/prefect:2-python3.10
  auto_deregister_task_definition: true
  cpu: null
  memory: null
  execution_role_arn: some-execution-role-arn
  configure_cloudwatch_logs: true
  cloudwatch_logs_options: {}
  stream_output: null
  launch_type: FARGATE
  vpc_id: some-vpc-id
  cluster: cluster-worker-arn
  task_role_arn: null
  task_customizations:
  - op: add
    path: /networkConfiguration/awsvpcConfiguration/securityGroups
    value:
    - some_security_group
  task_start_timeout_seconds: 120
  task_watch_poll_interval: 5.0
  _block_document_id: 99c69452-38cb-4ce0-a0f9-65035b2db175
  _block_document_name: default-ecs-job
  _is_anonymous: false
  block_type_slug: ecs-task
  _block_type_slug: ecs-task
storage: null
path: /opt/prefect/flows
entrypoint: minerva/test.py:my_flow
parameter_openapi_schema:
  title: Parameters
  type: object
  properties: {}
  required: null
  definitions: null
timestamp: '2023-09-27T14:08:05.642381+00:00'
triggers: []
enforce_parameter_schema: null

The error I’m getting:

Copy code

Failed to submit flow run '180f4415-b052-40d4-9f7c-21c1b6220835' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/tenacity/_init_.py", line 382, in _call_
    result = fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 1524, in _create_task_run
    return ecs_client.run_task(**task_run_request)["tasks"][0]
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 980, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ClusterNotFoundException: An error occurred (ClusterNotFoundException) when calling the RunTask operation: Cluster not found.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 724, in _create_task_and_wait_for_start
    task = self._create_task_run(ecs_client, task_run_request)
  File "/usr/local/lib/python3.10/site-packages/tenacity/_init_.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/usr/local/lib/python3.10/site-packages/tenacity/_init_.py", line 379, in _call_
    do = self.iter(retry_state=retry_state)
  File "/usr/local/lib/python3.10/site-packages/tenacity/_init_.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f997b4445b0 state=finished raised ClusterNotFoundException>]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/workers/base.py", line 843, in _submit_run_and_capture_errors
    result = await self.run(
  File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 567, in run
    ) = await run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 91, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 728, in _create_task_and_wait_for_start
    self._report_task_run_creation_failure(configuration, task_run_request, exc)
  File "/usr/local/lib/python3.10/site-packages/prefect_aws/workers/ecs_worker.py", line 819, in _report_task_run_creation_failure
    raise RuntimeError(
RuntimeError: Failed to run ECS task, cluster 'default' not found. Confirm that the cluster is configured in your region.

What am I missing? It seems like the task is trying to deploy into the default cluster which doesn’t exist although I have specified where I’d like to deploy the task into.

✅ 1

Oscar

09/27/2023, 3:11 PM

I have now added the cluster name to the worker pool template however it is still not finding the cluster

Islam Otmani

09/27/2023, 8:24 PM

Hi Oscar, this is what I was going to suggest exactly what you just did here also ^... I assume you can see it just fine with

aws ecs describe-clusters --cluster <your-cluster>

. Maybe try setting the region also, in the yaml under

infrastructure

aws_credentials

region_name

Oscar

10/05/2023, 7:16 PM

Sorry for going awol on this. My previous message was unfortunately a slight red herring. Due to a redeployment my cluster name had changed but didnt spot this immediately. At that time this was still hardcoded!

Oscar

10/05/2023, 7:16 PM

But adding the cluster name to the worker pool template solved it!

👍 1

Islam Otmani

10/05/2023, 8:39 PM

Glad to hear it!

307 Views

Open in Slack

Previous Next