Hi Prefect community,
Can we have somehow one agent interacting with two different node pools in a k8s cluster? Or do we need to deploy a second agent if we want to have different node configuration?
06/01/2022, 1:01 PM
What are you trying to do? Can you explain your use case a bit more?
generally (it depends on your use case) you could have two separate agents for each of those node pools purely to make it easier to manage that - each of those agents would have a different label and you could assign the same label to your flow's run config and match those agents bound to node pools wth the respective flows
06/01/2022, 1:08 PM
thanks Anna, that would work. the underlying question was also to know if the agent can live in a different node pool than the jobs. II do not want to pay for a machine all day long just to host the agent.
The best setup would be to have a lightweight nodepool to host the agent, that send the jobs to an other nodepools with the configuration needed for the flows
06/01/2022, 1:16 PM
good point, and honestly this is a hard problem because those two objectives contradict each other:
1. minimize costs and avoid idle compute
2. scale to extremely high capacity when needed with minimal latency
It's generally not possible to have both and you need to accept some trade-offs here
I totally understand the problem here but it's not an easy thing to do and depends a lot on your trade-offs and choices
a good place to get started would be:
• have one agent pod in each node pool
• all node pools start small
• you have some scaling policy and monitoring to manage scale-out and scale-in
06/01/2022, 1:25 PM
Thanks Anna, this is very helpful. For now I did not see any automatic scale up setting that I could use. So I am in a situation where I need a machine with more memory to run my flow (64go). Hence I need to create a new nodepool with a 64go machine that would handle the flow. And the agent would need to be hosted in this 64go VM even though it required a tiny fraction of it.
I am going to dig into the configuration to see if we can scale up, and how it works
I actually read too fast your last message, the problem is not really scaling-out but scaling up
06/01/2022, 1:44 PM
to scale up, you need a bigger box (VM with more memory), run one agent there - potentially a local agent, and find a way to either:
• batch your flows to ensure they are not all running at once to consume all resources at once but to queue them - for that you can use Prefect concurrency limits
• find a way to shut down the VM when no longer needed - on AWS you can use sth like instance scheduler and to ensure your agent starts on VM boot, check this part of tutorial
06/01/2022, 1:54 PM
Thanks again Anna, these are interesting resources.
The second point is interesting, but then i will need to start the VM again when i need to run the flows, so that the agent can pick the job request.
So i would need to build an additional flow to start the VM i guess.