Nico Neumann
03/03/2022, 12:37 PMNoah Holm
03/03/2022, 1:09 PMNico Neumann
03/03/2022, 1:20 PMNoah Holm
03/03/2022, 1:23 PMAnna Geller
03/03/2022, 1:27 PMShellTask
? If so, you could consider separating those into a separate ECS task to separate out those dependencies and potentially mitigate the issue, especially if you run flows with this non-python dependency infrequently. You could e.g. use this unofficial ecsruntask and here is a usage example for this task in a flow: ecsruntask_usage.py
#2 Latency between serverless and non-serverless ECS data plane
The fact that you need to wait a couple of minutes until your ECS container starts is completely normal. You need to consider all the work that AWS is doing here:
• checking your workloads requirements
• identify an EC2 instance that satisfies those requirements (enough CPU/memory, the right region and AZ)
• pulling the image for your container
• starting the container
• sending logs to CloudWatch
• and then Prefect needs to also do lot of work to communicate with this container to manage the task and flow run states, pull the logs, etc.
If this latency is not acceptable by your workflows, you should consider either:
• switching to ECS Fargate with self-managed EC2 data plane with instances being always on → you’re spot on that it doesn’t scale that well since you would need to manage that compute
• switching to a Kubernetes agent on AWS EKS with managed node groups which gives almost all the same benefits of a fully managed service as Fargate does but without the “serverless latency” (but at a slightly higher price) → regarding your worry of scale, EKS makes it incredibly easy to add additional managed nodes to your cluster as you need, or you could even combine it with EKS on Fargate - I discussed this in detail in this blog post.
#3 Consider AWS EKS with managed node groups instead of ECS Fargate for latency-sensitive workloads
If you want to spin up an EKS cluster instead, you can do that using eksctl. To spin up a cluster with a single node you need a single command which under the hood triggers a CloudFormation stack handling all the infrastructure details:
eksctl create cluster --name=prefect-eks --nodes=1
This blog post discusses the topic in much more detail, incl. a full walkthrough and even a CI/CD pipeline for that.
An additional benefit of using AWS EKS with managed node groups is that your instances are always on, therefore you don’t have to pull the Docker image if it already exists on the machine! You can do that by setting the imagePullPolicy
on your Kubernetes job template (see example in src/prefect/agent/kubernetes/job_template.yaml
) that you can pass to a KubernetesRun
, or you can set up directly when starting the agent:
prefect agent kubernetes install --key YOUR_API_KEY --label eks
This will return a Kubernetes manifest for the KubernetesAgent
that contains `imagePullPolicy`:
...
image: prefecthq/prefect:1.0.0-python3.7
imagePullPolicy: Always
...
You can see that by default it’s set to “Always”, but you can change it to IfNotPresent
image: prefecthq/prefect:1.0.0-python3.7
imagePullPolicy: IfNotPresent
You could also set it directly on your `KubernetesRun`:
flow.run_config = KubernetesRun(image_pull_policy="imagePullPolicy")
More on that policy here.
#Conclusion
So overall, I totally understand your concerns and your frustration with ECS Fargate latency. Out of curiosity I once benchmarked EKS on Fargate vs. EKS on managed node groups and Fargate was 12 times slower than managed node groups. So serverless is great and all but you need to be patient with it :) Perhaps you can do a small PoC with the setup from the dbt part 2 article and compare it with the ECS Fargate setup and decide which one works better for you.Nico Neumann
03/04/2022, 12:55 PM