hey so thus far we are successfully running flows ...
# ask-community
m
hey so thus far we are successfully running flows with our own base img with a size of 1.5 gb on eks with the kubernetes agent…. is there any way to speed up the pod spin up (we are using s3 storage stored as script for flows)
a
@Mike Lev I assume you are running AWS EKS with self-managed EC2 data plane? Since pulling the image from ECR is the most time-consuming process during the pod startup, you could specify the image pull policy to only pull the image if this image is not already present on the worker. You could specify this either in your Kubernetes job template, or directly on KubernetesRun:
Copy code
KubernetesRun(labels=["your-agent-label"], image="ecr-url", image_pull_policy="IfNotPresent")
You could also specify a
nodeSelector
to ensure that your flow pod always runs on a specific node and therefore to ensure you only need to pull the image once: https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/#create-a-pod-that-gets-scheduled-to-your-chosen-node
m
hey @Anna Geller currently we are running on fargate nodes so it will spin up a fargate worker and then spin down… wondering if
imagePullPolicy
is relevant here
a
correct, there is a very small chance that the second time you run your flow on Fargate that it will run on the same instance, so the image will still have to be pulled almost always. In general, with Fargate you need to consider some added latency needed to provision a micro VM for the container and to pull the image on this micro VM. There are two options you can consider: 1. You could add a self-managed always-on node to your cluster (and add it to a different namespace) for low-latency flows. I actually wrote about it a year go here. 2. You could add a nodegroup with cluster autoscaler to keep the benefit of scalability and low-maintenance https://eksctl.io/usage/autoscaling/
@Mike Lev this may be really interesting for you: just tested it in a flow-of-flows where parent flow additionally spins up 3 child flows, each runs in its own container. With Fargate having to spin up microVM for the parent flow and for all child flows, it took 12 minutes to complete, while running the same on a managed node took just 1 minute! So it definitely can make sense to spin up your own nodegroup f you don’t want this latency with serverless:
😯 1
🙌 1