< Marvin> if a ecs worker running in one ecs cluster submits Prefect Community #ask-marvin

<@ULVA73B9P> if a ecs worker running in one ecs cl...

Ravish Mallya

08/21/2025, 8:41 PM

@Marvin if a ecs worker running in one ecs cluster submits ecs task into another cluster, when the task runs, it tries to connect to prefect api, this is failing with timeout error

Marvin

08/21/2025, 8:41 PM

✅ thought for 336.8 seconds

Marvin

08/21/2025, 8:46 PM

It sounds like the ECS task that runs your flow can’t reach the Prefect API. When a worker submits a run to another ECS cluster, the task uses that target cluster’s networking — so this usually comes down to networking or API URL/auth config inside the task container. Most common causes - No outbound internet from the target cluster’s subnets (no NAT gateway or no public IP/IGW) - Security group/NACL blocking egress on 443 - Wrong

PREFECT_API_URL

or missing

PREFECT_API_KEY

- Corporate proxy required but not configured (HTTPS_PROXY/NO_PROXY) - DNS resolution blocked What to verify inside the flow-run container 1) Confirm Prefect API settings are present:

Copy code

prefect config view --show-secrets

You should see: -

PREFECT_API_URL

= https://api.prefect.cloud/api/accounts/<ACCOUNT_ID>/workspaces/<WORKSPACE_ID> -

PREFECT_API_KEY

= a valid key Docs: Connect to Prefect Cloud 2) Quick connectivity checks - Internet/DNS check:

Copy code

curl -I <https://api.prefect.cloud/api/health>

- Workspace-scoped API check (requires both env vars):

Copy code

python - <<'PY'
import os, httpx
url = os.environ['PREFECT_API_URL']
headers = {'Authorization': f'Bearer {os.environ.get("PREFECT_API_KEY","")}'}
print('Health:', httpx.get('<https://api.prefect.cloud/api/health>', timeout=10).status_code)
r = httpx.get(url + '/work-queues', headers=headers, timeout=10)
print('Workspace endpoint:', r.status_code, r.text[:200])
PY

If either request times out, it’s networking. If you get 401/403, it’s auth. ECS networking checklist for the target cluster/subnets - Subnets used by the task must have outbound internet: - Private subnets: route to a NAT Gateway - Public subnets: assign a public IP and route to an Internet Gateway - Security group attached to the task should allow egress to 0.0.0.0/0 on TCP 443 (or to your egress proxy) - NACLs allow ephemeral outbound and return traffic - VPC DNS hostnames/resolution enabled; the task can resolve

api.prefect.cloud

- If you require a proxy, set

HTTPS_PROXY

and optionally

NO_PROXY

in the task environment Prefect specifics - The ECS worker injects

PREFECT_API_URL

and

PREFECT_API_KEY

into the task based on your work pool/worker environment. If you override environment in the work pool’s base job template, make sure you aren’t unsetting these. - Enable more logs by adding

PREFECT_LOGGING_LEVEL=DEBUG

to the task env to get clearer error messages. Docs - ECS worker guide (config, networking considerations): Prefect AWS ECS guide - Cloud troubleshooting: Troubleshoot Prefect Cloud A few questions to pinpoint the issue - Are you using Prefect Cloud or a self-hosted API? What’s the exact

PREFECT_API_URL

inside the failing task? - Does the target cluster’s task run in private subnets without a NAT, or in public subnets without a public IP? - Do you require an outbound proxy? - Can you share the timeout stack trace and the subnets/SG settings used by the work pool’s base job template for that cluster? If you share the work pool’s ECS settings (subnets, security groups, assign public IP), I can help spot the likely egress gap.

Ravish Mallya

08/21/2025, 9:38 PM

@Marvin it has to do with network mode. the worker was in

bridge

mode while job task was in

awsvpc

mode. reverting job task to bride mode is running the task. why would this happen?

Marvin

08/21/2025, 9:38 PM

✅ thought for 38.4 seconds

Marvin

08/21/2025, 9:39 PM

Short answer: awsvpc gives each ECS task its own ENI, subnets, and security group. Bridge shares the EC2 host’s network. Your awsvpc task likely had no valid egress path (no NAT/public IP, restrictive SG/NACL, or DNS), so it couldn’t reach Prefect Cloud. In bridge mode, the task “rides” the host’s working internet path, so it succeeds. Why this happens - awsvpc (per‑task ENI): - You must explicitly provide subnets and a security group. The task’s outbound traffic uses that ENI. If the chosen subnets don’t have a NAT gateway (private) or the task isn’t assigned a public IP in a public subnet, there’s no internet egress. - The task’s security group needs egress on TCP 443 to 0.0.0.0/0 (or to your proxy). - VPC DNS must be enabled and NACLs must allow return traffic. - bridge (host network namespace): - Outbound uses the EC2 instance’s ENI, SG, and route table. If the instance can get out, your task can too, even if the task’s own SG/subnet settings would have blocked it in awsvpc. How to fix awsvpc so your task can reach Prefect Cloud - In the Prefect ECS work pool’s base job template, set the network configuration explicitly: - Subnets: - Private subnets: must route 0.0.0.0/0 to a NAT Gateway, or - Public subnets: enable assignPublicIp = ENABLED - Security groups: allow egress to 0.0.0.0/0 on TCP 443 - Ensure NACLs allow ephemeral outbound and return traffic - VPC DNS support/hostnames enabled; the task must resolve api.prefect.cloud - Confirm Prefect env vars inside the task: -

PREFECT_API_URL

= https://api.prefect.cloud/api/accounts/<ACCOUNT_ID>/workspaces/<WORKSPACE_ID> -

PREFECT_API_KEY

= valid key - Test from inside an awsvpc task container:

Copy code

prefect config view --show-secrets
curl -I <https://api.prefect.cloud/api/health>

- If you require an egress proxy, add

HTTPS_PROXY

(and

NO_PROXY

if needed) to the task environment. Notes - Fargate only supports awsvpc, so getting awsvpc right is important if you ever switch from EC2 to Fargate. - Different network modes for worker vs. run task is fine; they’re separate tasks. The outage indicates the awsvpc task’s network config lacks egress. Docs - ECS worker guide: Prefect AWS ECS guide - Cloud connectivity/troubleshooting: Troubleshoot Prefect Cloud - Cloud connection details: Connect to Prefect Cloud If you can share the awsvpc subnets, security groups, and whether assignPublicIp is enabled in your work pool’s base job template, I can point out the exact egress gap.

3 Views

Open in Slack

Previous Next