I have a quick question I was thinking about using preemptib Prefect Community #ask-community

I have a quick question, I was thinking about usin...

Damien Ramunno-Johnson

05/20/2021, 11:04 PM

I have a quick question, I was thinking about using preemptible machines. If I was using checkpointing and caching the results I guess it would just retry the flow?

Kevin Kho

05/20/2021, 11:07 PM

Hi @Damien Ramunno-Johnson I guess if you use LocalResults it won't be able to get them but if you're doing something like this, S3Result or something like that is probably an option.

Damien Ramunno-Johnson

05/20/2021, 11:08 PM

Yeah, we are normally using GCS for the results/cache

Damien Ramunno-Johnson

05/20/2021, 11:14 PM

Big cost savings potentially

Dylan

05/21/2021, 1:27 AM

Just be careful to except your agent node from preemption

Damien Ramunno-Johnson

05/21/2021, 6:33 PM

Okay, in our case the flows would be running themselves with the docker agent. I was thinking we would have several agents running so that it could retry on a different agent

Damien Ramunno-Johnson

05/21/2021, 6:35 PM

But I guess the retrying the flow on a new agent like that isn’t supported? Also raises my second question. If an agent fails (lets say it went OOM, or something like that) would flows try to retry on a new agent?

Damien Ramunno-Johnson

05/21/2021, 7:03 PM

I was thinking we could use a managed instance group that would make sure there was always 1 of the agents running (and use the autoscaling when needed)

Kevin Kho

05/21/2021, 7:04 PM

I think the flows will not retry on a new agent automatically but if it were re-run, a new agent can pick up the flow yeah

Kevin Kho

05/21/2021, 7:05 PM

Agents with shared settings are assigned the same id, so you can have an two instances be registered as the same agent

Dylan

05/21/2021, 8:14 PM

Any agent that can pick up a Flow Run (including one that previously was in a Failed state and is now in a Scheduled state) will attempt to do so. The way that you can configure which agents attempt to pull work for which Flows is through labels

Dylan

05/21/2021, 8:15 PM

You can have multiple agents in this scenario and put them on preemptible nodes

Tyler Wanner

05/21/2021, 9:22 PM

sounds like sound architecture Damien. This is a good usage of Prefect retries. There are some trip-ups doing this at scale with k8s (because k8s will also try to retry your jobs, less smartly), but if you're not using k8s, it should be straight-forward

Tyler Wanner

05/21/2021, 9:27 PM

you may not need 24/7 uptime on your agent--that's up to you. If you don't mind a little disruption/ some late flows, you should be able to run without multiple agents

Damien Ramunno-Johnson

05/22/2021, 2:55 AM

Ah interesting, I got a managed instance cluster with autoscaling setup so that if a node goes down it will spin up again (if needed). I will try it with the retires. Thanks

10 Views

Open in Slack

Previous Next