We are using Kubernetes a lot in our flows creating services Prefect Community #ask-community

We are using Kubernetes a lot in our flows (creati...

Joël Luijmes

02/25/2021, 8:33 AM

We are using Kubernetes a lot in our flows (creating services, pods and jobs), and sometimes this yields the following error. This usually happens when we burst kubernetes operations.

Copy code

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Operation cannot be fulfilled on resourcequotas \"gke-resource-quotas\": the object has been modified; please apply your changes to the latest version and try again",
  "reason": "Conflict",
  "details": {
    "name": "gke-resource-quotas",
    "kind": "resourcequotas"
  },
  "code": 409
}

Googling didn’t yield much result but it seems like an internal kubernetes issue. So to fix this, I think there are two ways: 1. Use prefect retry mechanism 2. Modify prefect tasks to retry on this error (willing to contribute myself) 3. Modify my code to retry My question: What would be the best approach here? With 1) it still might fail because on retry, the same burst exists when creating kubernetes objects (or can I perform random delay?) + in a resource manager I may create multiple resources -> retry does not exist (AFAIK), and if it does how would I track which resources exist. With 2) don’t know if this is right methodology, can imagine retrying in task lilbs is anti-pattern With 3) no downsides except I have to impleement this ev3rywere

Open in Slack

Previous Next