<@ULVA73B9P> [push work pool] [ecs] [aws] Is it po...
# ask-marvin
g
@Marvin [push work pool] [ecs] [aws] Is it possible to setup a push work pool on ECS using an existing ECS cluster setup with ec2 providers? I want to cache the docker images to reduce flow run start latency. Explain your thought process, the differences, and whether it's possible or not.
m
thought for 389.0 seconds
Short answer: Yes. In Prefect 3.x you can run a push work pool against an existing ECS cluster that uses EC2 capacity providers. This is a good way to reduce cold start latency because EC2 instances can cache your Docker images locally between runs. How to think about it - ECS launch types: - Fargate: fully managed, but tasks generally pull images each run; you cannot control caching on the host, so large images increase cold-start time. - EC2: tasks run on your EC2 container instances. Those instances keep Docker image layers on disk, so repeated runs with the same image are much faster, especially if you keep a baseline of instances warm and pre-pull images. - Push work pool mechanics: - Prefect “push” calls ECS directly to register/run a task using the pool’s base job template + job variables. - For EC2-backed clusters you can either: 1) Set
launch_type
to
EC2
(simple, no capacity providers), or 2) Provide a
capacityProviderStrategy
that points at your EC2 capacity provider(s) and omit
launchType
entirely. Key detail: AWS does not allow specifying
launchType
and
capacityProviderStrategy
at the same time on
RunTask
. Pick one approach. What this looks like with Prefect - Prefect ships an ECS base job template that includes placeholders for both
launchType
and
capacityProviderStrategy
. For EC2 you have two viable configurations: Option A: Plain EC2 (no capacity providers) - Set
launch_type
to
EC2
- Do not set
capacity_provider_strategy
- This is the simplest way to use your existing EC2 cluster; image caching works because your instances keep layers locally. Option B: Capacity providers (preferred if you want auto scaling via ASGs) - Remove the
launchType
field from the work pool’s base job template (or ensure the
launch_type
variable is not rendered) - Set
capacity_provider_strategy
to your provider(s), e.g.
[{ "capacityProvider": "MyEC2Provider", "base": 0, "weight": 1 }]
Getting the starting template and updating it - Get the default ECS template:
Copy code
prefect work-pool get-default-base-job-template --type ecs
- For Option B (capacity providers), edit the
task_run_request
section to drop
launchType
and keep
capacityProviderStrategy
, for example:
Copy code
"task_run_request": {
  "cluster": "{{ cluster }}",
  "overrides": { ... },
  "tags": "{{ labels }}",
  "taskDefinition": "{{ task_definition_arn }}",
  "capacityProviderStrategy": "{{ capacity_provider_strategy }}"
}
- Create/update the pool:
Copy code
prefect work-pool create my-ecs-pool --type ecs --base-job-template ./ecs.json
# or
prefect work-pool update my-ecs-pool --base-job-template ./ecs.json
- Then deploy flows with job variables pointing at your cluster, image, roles, networking, and either: - Option A:
{"launch_type": "EC2"}
- Option B:
{"capacity_provider_strategy": [{"capacityProvider": "MyEC2Provider","base":0,"weight":1}]}
Notes, caveats, and performance tips - If you set a
capacityProviderStrategy
, ensure
launchType
is omitted; specifying both will cause an AWS error. There was a historical bug where
launchType
defaulted to
FARGATE
even when using capacity providers; if you see that behavior, upgrade to the latest
prefect-aws
and explicitly remove the
launchType
key from the base job template. Related issue for context: Setting capacityProviderStrategy not working in Push Work Pool. - Pre-pull your images on EC2 instances to maximize cache hits: - Add a user‑data script or ASG lifecycle hook that runs
docker pull <your-ecr-image:tag>
on instance launch. - Keep a small warm pool or minimum ASG size > 0 to retain cached layers. - Use smaller, layered images and reuse a stable “runtime” image while loading your flow code from source; that way the image changes less frequently and caches better. - Networking/roles:
- Ensure your task definition uses an execution role with ECR pull permissions and your instances can reach ECR (VPC endpoints or NAT). - For
awsvpc
network mode, provide subnets/security groups in
network_configuration
. Docs - Prefect ECS integration guide: ECS guide - Capacity provider issue reference: GitHub issue #13030 If you’d like, I can generate a minimal base job template JSON for your chosen approach (plain EC2 vs capacity providers) and a checklist of the job variables to set. What’s your cluster name/ARN and are you already using capacity providers on it?
g
@Marvin In this setup what is the difference with a ECS work pool (not push pool)
m
thought for 72.3 seconds
Great question. In your EC2-backed ECS setup, the core difference is “who talks to AWS and where that logic runs.” High-level - ECS push work pool (no worker): - Prefect Cloud/Server initiates the ECS RunTask call using the pool’s base job template and stored AWS credentials. - No worker process to run or maintain. Prefect “pushes” runs straight to ECS. - Requirements: You must allow Prefect to use IAM credentials that can call public AWS ECS/ECR APIs. If your org only exposes AWS APIs via private VPC endpoints, push may not work. - ECS work pool with a worker (pull model): - You run
prefect worker start -p <your-ecs-pool>
somewhere in your environment. - The worker polls Prefect for runs, then it calls ECS RunTask from inside your network using its local AWS credentials. - Requirements: You manage that always-on worker compute; useful when AWS API access is restricted to your VPC/private endpoints or you need tighter network control. What stays the same - Same ECS cluster, same task definitions, same ability to use EC2 capacity providers for image caching. - Same base job template and job variables. Both support either: -
launch_type: EC2
, or -
capacityProviderStrategy: [...]
(omit
launchType
if you set this) - Start latency on ECS with EC2 image caching is identical across push vs worker; caching happens on your EC2 instances, not in Prefect. Practical differences - Ops overhead: Push has no worker to run; worker requires an always-on process. - Network/control: - Push: Prefect must reach AWS ECS/ECR over public endpoints with provided IAM creds. - Worker: Runs inside your environment; can use private VPC endpoints, SSO, or enterprise network constraints. - Submission timing: - Push: Immediate submission when a run is scheduled (no poll interval). - Worker: Polls for runs; negligible in most cases but it’s an extra moving part. - Upgrades/behavior: - Push uses Prefect’s managed integration. If we add fields (e.g.,
capacityProviderStrategy
), you’ll see them as soon as you update the pool template. - Worker behavior depends on your installed
prefect-aws
version. If you use capacity providers, ensure you’re on a recent version and remove
launchType
when setting
capacityProviderStrategy
. Older behavior noted here: issue #13030. When to choose which - Choose push if you can let Prefect call AWS directly and want the simplest, scale-to-zero control plane. - Choose worker if your AWS API access is private/restricted, you need to run everything inside your network, or you want to inject custom operational logic around submissions. Docs - ECS integration guide: Prefect ECS guide - Capacity provider reference: GitHub issue #13030 If you share any network/IAM constraints, I can recommend which model (push vs worker) is the better fit for your environment.