< Marvin> push work pool ecs aws Is it possible to setup a p Prefect Community #ask-marvin

<@ULVA73B9P> [push work pool] [ecs] [aws] Is it po...

gee

09/16/2025, 3:57 PM

@Marvin [push work pool] [ecs] [aws] Is it possible to setup a push work pool on ECS using an existing ECS cluster setup with ec2 providers? I want to cache the docker images to reduce flow run start latency. Explain your thought process, the differences, and whether it's possible or not.

Marvin

09/16/2025, 3:58 PM

✅ thought for 389.0 seconds

Marvin

09/16/2025, 4:04 PM

Short answer: Yes. In Prefect 3.x you can run a push work pool against an existing ECS cluster that uses EC2 capacity providers. This is a good way to reduce cold start latency because EC2 instances can cache your Docker images locally between runs. How to think about it - ECS launch types: - Fargate: fully managed, but tasks generally pull images each run; you cannot control caching on the host, so large images increase cold-start time. - EC2: tasks run on your EC2 container instances. Those instances keep Docker image layers on disk, so repeated runs with the same image are much faster, especially if you keep a baseline of instances warm and pre-pull images. - Push work pool mechanics: - Prefect “push” calls ECS directly to register/run a task using the pool’s base job template + job variables. - For EC2-backed clusters you can either: 1) Set

launch_type

EC2

(simple, no capacity providers), or 2) Provide a

capacityProviderStrategy

that points at your EC2 capacity provider(s) and omit

launchType

entirely. Key detail: AWS does not allow specifying

launchType

and

capacityProviderStrategy

at the same time on

RunTask

. Pick one approach. What this looks like with Prefect - Prefect ships an ECS base job template that includes placeholders for both

launchType

and

capacityProviderStrategy

. For EC2 you have two viable configurations: Option A: Plain EC2 (no capacity providers) - Set

launch_type

EC2

- Do not set

capacity_provider_strategy

- This is the simplest way to use your existing EC2 cluster; image caching works because your instances keep layers locally. Option B: Capacity providers (preferred if you want auto scaling via ASGs) - Remove the

launchType

field from the work pool’s base job template (or ensure the

launch_type

variable is not rendered) - Set

capacity_provider_strategy

to your provider(s), e.g.

[{ "capacityProvider": "MyEC2Provider", "base": 0, "weight": 1 }]

Getting the starting template and updating it - Get the default ECS template:

Copy code

prefect work-pool get-default-base-job-template --type ecs

- For Option B (capacity providers), edit the

task_run_request

section to drop

launchType

and keep

capacityProviderStrategy

, for example:

Copy code

"task_run_request": {
  "cluster": "{{ cluster }}",
  "overrides": { ... },
  "tags": "{{ labels }}",
  "taskDefinition": "{{ task_definition_arn }}",
  "capacityProviderStrategy": "{{ capacity_provider_strategy }}"
}

- Create/update the pool:

Copy code

prefect work-pool create my-ecs-pool --type ecs --base-job-template ./ecs.json
# or
prefect work-pool update my-ecs-pool --base-job-template ./ecs.json

- Then deploy flows with job variables pointing at your cluster, image, roles, networking, and either: - Option A:

{"launch_type": "EC2"}

- Option B:

{"capacity_provider_strategy": [{"capacityProvider": "MyEC2Provider","base":0,"weight":1}]}

Notes, caveats, and performance tips - If you set a

capacityProviderStrategy

, ensure

launchType

is omitted; specifying both will cause an AWS error. There was a historical bug where

launchType

defaulted to

FARGATE

even when using capacity providers; if you see that behavior, upgrade to the latest

prefect-aws

and explicitly remove the

launchType

key from the base job template. Related issue for context: Setting capacityProviderStrategy not working in Push Work Pool. - Pre-pull your images on EC2 instances to maximize cache hits: - Add a user‑data script or ASG lifecycle hook that runs

docker pull <your-ecr-image:tag>

on instance launch. - Keep a small warm pool or minimum ASG size > 0 to retain cached layers. - Use smaller, layered images and reuse a stable “runtime” image while loading your flow code from source; that way the image changes less frequently and caches better. - Networking/roles:

Marvin

09/16/2025, 4:04 PM

- Ensure your task definition uses an execution role with ECR pull permissions and your instances can reach ECR (VPC endpoints or NAT). - For

awsvpc

network mode, provide subnets/security groups in

network_configuration

. Docs - Prefect ECS integration guide: ECS guide - Capacity provider issue reference: GitHub issue #13030 If you’d like, I can generate a minimal base job template JSON for your chosen approach (plain EC2 vs capacity providers) and a checklist of the job variables to set. What’s your cluster name/ARN and are you already using capacity providers on it?

gee

09/16/2025, 4:18 PM

@Marvin In this setup what is the difference with a ECS work pool (not push pool)

Marvin

09/16/2025, 4:18 PM

✅ thought for 72.3 seconds

Marvin

09/16/2025, 4:19 PM

Great question. In your EC2-backed ECS setup, the core difference is “who talks to AWS and where that logic runs.” High-level - ECS push work pool (no worker): - Prefect Cloud/Server initiates the ECS RunTask call using the pool’s base job template and stored AWS credentials. - No worker process to run or maintain. Prefect “pushes” runs straight to ECS. - Requirements: You must allow Prefect to use IAM credentials that can call public AWS ECS/ECR APIs. If your org only exposes AWS APIs via private VPC endpoints, push may not work. - ECS work pool with a worker (pull model): - You run

prefect worker start -p <your-ecs-pool>

somewhere in your environment. - The worker polls Prefect for runs, then it calls ECS RunTask from inside your network using its local AWS credentials. - Requirements: You manage that always-on worker compute; useful when AWS API access is restricted to your VPC/private endpoints or you need tighter network control. What stays the same - Same ECS cluster, same task definitions, same ability to use EC2 capacity providers for image caching. - Same base job template and job variables. Both support either: -

launch_type: EC2

, or -

capacityProviderStrategy: [...]

(omit

launchType

if you set this) - Start latency on ECS with EC2 image caching is identical across push vs worker; caching happens on your EC2 instances, not in Prefect. Practical differences - Ops overhead: Push has no worker to run; worker requires an always-on process. - Network/control: - Push: Prefect must reach AWS ECS/ECR over public endpoints with provided IAM creds. - Worker: Runs inside your environment; can use private VPC endpoints, SSO, or enterprise network constraints. - Submission timing: - Push: Immediate submission when a run is scheduled (no poll interval). - Worker: Polls for runs; negligible in most cases but it’s an extra moving part. - Upgrades/behavior: - Push uses Prefect’s managed integration. If we add fields (e.g.,

capacityProviderStrategy

), you’ll see them as soon as you update the pool template. - Worker behavior depends on your installed

prefect-aws

version. If you use capacity providers, ensure you’re on a recent version and remove

launchType

when setting

capacityProviderStrategy

. Older behavior noted here: issue #13030. When to choose which - Choose push if you can let Prefect call AWS directly and want the simplest, scale-to-zero control plane. - Choose worker if your AWS API access is private/restricted, you need to run everything inside your network, or you want to inject custom operational logic around submissions. Docs - ECS integration guide: Prefect ECS guide - Capacity provider reference: GitHub issue #13030 If you share any network/IAM constraints, I can recommend which model (push vs worker) is the better fit for your environment.

5 Views

Open in Slack

Previous Next