Hey, I have a failed flow run on EKS - i’m trying ...
# prefect-community
a
Hey, I have a failed flow run on EKS - i’m trying to restart but for some reason when its spinning up the new pod its requesting CPU that the original run config did not set. From run config on the flow run:
Copy code
"cpu_request": "4",
On describe pod that is spun up:
Copy code
Requests:
      cpu:     6
      memory:  16Gi
Why is this happening on the restarted flow run that i’m trying?
This is stopping the pod from starting because the CPU request, fargate cannot provision it. But the original flow run ran fine until we ran into an error, and now I need to restart it
a
You're a bit too fast 😄 Can we start by providing some basic info? 1. Are you on Cloud or Server? 2. You mentioned EKS and Fargate - does it mean you are using KubernetesAgent spun up on EKS on Fargate? (i.e. serverless Kubernetes data plane) 3. How did you start your agent? 4. What's your run configuration?
i’m trying to restart but for some reason when its spinning up the new pod its requesting CPU that the original run config did not set.
Restarts normally don't create a new flow run, instead they update task run history for failed task. So if your Kubernetes pod defined for this flow run is no longer around, Prefect probably does it best to infer that using the default configuration set on the agent, that's why you probably see the same config that is used by default when you don't specify any custom CPU request on your
KubernetesRun
now I need to restart it
This is tricky for the reasons I mentioned above. Maybe you can temporarily comment out the tasks you don't wanna run, register your flow, trigger your flow run with the task that you need to rerun (2 CLI commands will do) then you can uncomment and reregister the flow - that's probably the fastest way alternatively, with Prefect you can always (re)run your flow locally so that's also an option
a
1. We are on the cloud 2. Yes we are using the Kubernetes Agent, spun up on EKS Fargate 3. I think we started our agent a while ago, based on a tutorial we found online. Can’t remember how exactly, but its been working fine before 4. KubernetesRun(_cpu_request_=4,_memory_request_=“16Gi”)
Hmm, how do i update the default agent - because I did define a kubernetes run cpu request I think
in the run config, as shown above
and it works normally
a
You can restart your agent by stopping the current agent process and starting a new one. It may actually be quite beneficial if you did it long time ago as it allows you to upgrade to the latest Prefect version - by now we are on Prefect 1.1! 🙂 https://discourse.prefect.io/t/prefect-1-1-0-is-out-introducing-pipe-operator-for-funct[…]ra-zombie-processes-and-many-task-library-enhancements/475
a
to restart my agent, do i just delete the pod - and then run prefect agent install kuberentes etc, with the proper arguments
a
So with Kubernetes deployments you can either delete the current deployment and create a new one, or you can apply changes to the previous deployment and reapply - this will replace the agent pod with a new updated one kubectl apply -f yourfile.yaml