Matias Godoy

    Matias Godoy

    1 year ago
    Hi guys! We have successfully deployed a Kubernetes Agent in Amazon EKS following your great tutorial on Medium. I executed some stress tests by starting around 30 flow runs for the new k8s agent, and I noticed that it picked 6 runs at a time (I don't know if 6 is a setting, or the agent picks this number based on the node capacity), and queued the rest. It's not bad! but since our flow runs can take as long as 10 hours each, now I'd like to setup an autoscaler to avoid having a long queue of runs waiting for so long. As you might already inferred, I'm new to Kubernetes. My guess is that HPA is not the right answer here. Maybe a cluster-autoscaler is the way. Could you guys give me a hint on where should I start for setting up an autoscaler on EKS that works with the Kubernetes Agent? Thanks!
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    Hi Matias, this sounds like your k8s cluster is capped out on resources, so the agent is creating k8s jobs for each flow run but there's not room to run them in the cluster so they queue. You'd want to look at creating an autoscaling node pool, or just scaling up your cluster to have more nodes.
    At the agent level there are no limits on the number of active flow runs, so you shouldn't need to start multiple agents to increase concurrency.
    j

    Joe Schmid

    1 year ago
    @Matias Godoy We've used the cluster-autoscaler successfully with our k8s cluster on EKS for Prefect Flows for about a year now. You'll want to think a bit about resource requirements for your Flows and whether you're best served by a single auto-scaling group (ASG) or multiple. As with everything, I'd recommend starting on the simple side with a single ASG and getting that working before adding more complexity. FWIW, we used eksctl to capture our EKS cluster config in a YAML file and have that tool create all required resources.
    Matias Godoy

    Matias Godoy

    1 year ago
    great, thank you guys for your answers! I'll look into that
    Just came back to let you know that today I implemented ASG and made a new stress test. It works beautifully! So satisfying... Thanks again, guys!