https://prefect.io logo
Title
m

Marwan Sarieddine

12/04/2022, 2:26 PM
Hi folks, I am facing an issue unable to remove an agent in prefect 1 (cloud) - I tried doing so via graphql and the UI to no avail
1️⃣ 1
m

Mason Menges

12/05/2022, 7:10 PM
Hey @Marwan Sarieddine Could you elaborate a bit on what you're seeing, is the infrastructure that was running the agent actually shutdown? are you seeing any error when trying to remove it from the UI?
m

Marwan Sarieddine

12/05/2022, 7:19 PM
@Mason Menges thank you for following up on this So this is a kubernetes agent that I took down from the infrastructure that was running it It has been lingering for more than a week on the prefect cloud UI (under https://cloud.prefect.io/tenant/agent as an unhealthy agent and everytime I try to remove it I get this error shown in the screenshot
We had a problem removing your Agent. Please try again.
When I attempt to run this graphql mutation
mutation{
  delete_agent(input:{
    agent_id:"e2b38599-a3e5-4b9d-b1c5-a3ee6076c9d8"
  }){
    success,
    error
  }
}
I get this error
{
  "errors": [
    {
      "path": [
        "delete_agent"
      ],
      "message": "Operation timed out",
      "extensions": {
        "code": "API_ERROR"
      }
    }
  ],
  "data": {
    "delete_agent": null
  }
}
The unfortunate part is I believe this agent is still being taken into consideration when trying to submit runs. It shares the same labels with another agent but its labels are only a subset of the other agent, this results in our flows getting stuck in a scheduled state
m

Mason Menges

12/05/2022, 7:27 PM
Thanks for the additional details, I'll look around a bit and see what I can dig up, we're doing some maintenance this weekend on our backend that might help with this as well but I'll see what I can find in the mean time.
m

Marwan Sarieddine

12/05/2022, 7:27 PM
Thank you
m

Mason Menges

12/07/2022, 4:30 PM
Hey @Marwan Sarieddine digging around it's possible that there might still be flows runs that are referencing this agent, this might happen when the agent is responsible for submitting a particularly large number of flow runs, you might see if you can filter for flow runs submitted by this agent and see if any are stuck in a non-final state and if so see if you can manually update their state.
m

Marwan Sarieddine

12/07/2022, 4:31 PM
Thanks for the suggestion - so if I stop all stuck runs then this might unblock the agent ?
I will confirm if this is the case and if it resolves things
m

Mason Menges

12/07/2022, 4:34 PM
In theory yes 😅
🤞 1
m

Marwan Sarieddine

12/21/2022, 9:51 PM
@Mason Menges this past weekend I stopped all running flow runs and attempted a delete on the agent, it did not work - still getting the same error please see the attached screenshots
Attempting the delete using a graphql mutation returns an API timeout error
m

Mason Menges

12/21/2022, 9:57 PM
Hmm, Have you tried turning off the schedules for the flows that are submitting work to this agent? This should remove any future scheduled runs as well as any submittable runs from the agent and should hopefully allow you to delete it.
the build up of scheduled runs on it might also be preventing this from getting removed
m

Marwan Sarieddine

12/21/2022, 9:57 PM
Oh I did not turn off the schedules, ok I will attempt to do so this weekend
thanks for the quick follow up
So I removed all scheduled flow runs - still getting the same timeout error and I am not able to delete the agent
@Mason Menges
interesting is that your suggestion partially resolved the issue - i.e. there were three agents that were stuck and I am now down to only one…