<@ULVA73B9P> if a ecs worker running in one ecs cl...
# ask-marvin
r
@Marvin if a ecs worker running in one ecs cluster submits ecs task into another cluster, when the task runs, it tries to connect to prefect api, this is failing with timeout error
m
thought for 336.8 seconds
It sounds like the ECS task that runs your flow can’t reach the Prefect API. When a worker submits a run to another ECS cluster, the task uses that target cluster’s networking — so this usually comes down to networking or API URL/auth config inside the task container. Most common causes - No outbound internet from the target cluster’s subnets (no NAT gateway or no public IP/IGW) - Security group/NACL blocking egress on 443 - Wrong
PREFECT_API_URL
or missing
PREFECT_API_KEY
- Corporate proxy required but not configured (HTTPS_PROXY/NO_PROXY) - DNS resolution blocked What to verify inside the flow-run container 1) Confirm Prefect API settings are present:
Copy code
prefect config view --show-secrets
You should see: -
PREFECT_API_URL
= https://api.prefect.cloud/api/accounts/&lt;ACCOUNT_ID&gt;/workspaces/&lt;WORKSPACE_ID> -
PREFECT_API_KEY
= a valid key Docs: Connect to Prefect Cloud 2) Quick connectivity checks - Internet/DNS check:
Copy code
curl -I <https://api.prefect.cloud/api/health>
- Workspace-scoped API check (requires both env vars):
Copy code
python - <<'PY'
import os, httpx
url = os.environ['PREFECT_API_URL']
headers = {'Authorization': f'Bearer {os.environ.get("PREFECT_API_KEY","")}'}
print('Health:', httpx.get('<https://api.prefect.cloud/api/health>', timeout=10).status_code)
r = httpx.get(url + '/work-queues', headers=headers, timeout=10)
print('Workspace endpoint:', r.status_code, r.text[:200])
PY
If either request times out, it’s networking. If you get 401/403, it’s auth. ECS networking checklist for the target cluster/subnets - Subnets used by the task must have outbound internet: - Private subnets: route to a NAT Gateway - Public subnets: assign a public IP and route to an Internet Gateway - Security group attached to the task should allow egress to 0.0.0.0/0 on TCP 443 (or to your egress proxy) - NACLs allow ephemeral outbound and return traffic - VPC DNS hostnames/resolution enabled; the task can resolve
api.prefect.cloud
- If you require a proxy, set
HTTPS_PROXY
and optionally
NO_PROXY
in the task environment Prefect specifics - The ECS worker injects
PREFECT_API_URL
and
PREFECT_API_KEY
into the task based on your work pool/worker environment. If you override environment in the work pool’s base job template, make sure you aren’t unsetting these. - Enable more logs by adding
PREFECT_LOGGING_LEVEL=DEBUG
to the task env to get clearer error messages. Docs - ECS worker guide (config, networking considerations): Prefect AWS ECS guide - Cloud troubleshooting: Troubleshoot Prefect Cloud A few questions to pinpoint the issue - Are you using Prefect Cloud or a self-hosted API? What’s the exact
PREFECT_API_URL
inside the failing task? - Does the target cluster’s task run in private subnets without a NAT, or in public subnets without a public IP? - Do you require an outbound proxy? - Can you share the timeout stack trace and the subnets/SG settings used by the work pool’s base job template for that cluster? If you share the work pool’s ECS settings (subnets, security groups, assign public IP), I can help spot the likely egress gap.
r
@Marvin it has to do with network mode. the worker was in
bridge
mode while job task was in
awsvpc
mode. reverting job task to bride mode is running the task. why would this happen?
m
thought for 38.4 seconds
Short answer: awsvpc gives each ECS task its own ENI, subnets, and security group. Bridge shares the EC2 host’s network. Your awsvpc task likely had no valid egress path (no NAT/public IP, restrictive SG/NACL, or DNS), so it couldn’t reach Prefect Cloud. In bridge mode, the task “rides” the host’s working internet path, so it succeeds. Why this happens - awsvpc (per‑task ENI): - You must explicitly provide subnets and a security group. The task’s outbound traffic uses that ENI. If the chosen subnets don’t have a NAT gateway (private) or the task isn’t assigned a public IP in a public subnet, there’s no internet egress. - The task’s security group needs egress on TCP 443 to 0.0.0.0/0 (or to your proxy). - VPC DNS must be enabled and NACLs must allow return traffic. - bridge (host network namespace): - Outbound uses the EC2 instance’s ENI, SG, and route table. If the instance can get out, your task can too, even if the task’s own SG/subnet settings would have blocked it in awsvpc. How to fix awsvpc so your task can reach Prefect Cloud - In the Prefect ECS work pool’s base job template, set the network configuration explicitly: - Subnets: - Private subnets: must route 0.0.0.0/0 to a NAT Gateway, or - Public subnets: enable assignPublicIp = ENABLED - Security groups: allow egress to 0.0.0.0/0 on TCP 443 - Ensure NACLs allow ephemeral outbound and return traffic - VPC DNS support/hostnames enabled; the task must resolve api.prefect.cloud - Confirm Prefect env vars inside the task: -
PREFECT_API_URL
= https://api.prefect.cloud/api/accounts/&lt;ACCOUNT_ID&gt;/workspaces/&lt;WORKSPACE_ID> -
PREFECT_API_KEY
= valid key - Test from inside an awsvpc task container:
Copy code
prefect config view --show-secrets
curl -I <https://api.prefect.cloud/api/health>
- If you require an egress proxy, add
HTTPS_PROXY
(and
NO_PROXY
if needed) to the task environment. Notes - Fargate only supports awsvpc, so getting awsvpc right is important if you ever switch from EC2 to Fargate. - Different network modes for worker vs. run task is fine; they’re separate tasks. The outage indicates the awsvpc task’s network config lacks egress. Docs - ECS worker guide: Prefect AWS ECS guide - Cloud connectivity/troubleshooting: Troubleshoot Prefect Cloud - Cloud connection details: Connect to Prefect Cloud If you can share the awsvpc subnets, security groups, and whether assignPublicIp is enabled in your work pool’s base job template, I can point out the exact egress gap.