My flows use KubernetesJobEnvironment and I specify a custom Prefect Community #ask-community

My flows use KubernetesJobEnvironment and I specif...

Troy Sankey

05/13/2020, 2:27 PM

My flows use KubernetesJobEnvironment and I specify a custom job_spec, but I'm noticing that I need to manually delete the k8s job between flow runs or else subsequent runs will fail to create the k8s job due to the job already existing.

Troy Sankey

05/13/2020, 2:28 PM

For instance, i have a job that loads google analytics data to snowflake, and here's the error it throws when I don't manually delete the k8s job:

Copy code

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "jobs.batch \"load-google-analytics-data-to-snowflake\" already exists",
  "reason": "AlreadyExists",
  "details": {
    "name": "load-google-analytics-data-to-snowflake",
    "group": "batch",
    "kind": "jobs"
  },
  "code": 409
}

Troy Sankey

05/13/2020, 2:28 PM

so every time I run the job, I do this:

Copy code

kubectl --namespace=prefect delete job load-google-analytics-data-to-snowflake

Troy Sankey

05/13/2020, 2:29 PM

I assume that once I put this job into production and have it run on a schedule, it will immediately break?

Troy Sankey

05/13/2020, 2:30 PM

My main question is this: I must be missing something, right? How can I make the flow work without manual intervention?

Troy Sankey

05/13/2020, 2:33 PM

Perhaps I just have to put a TTL directly in my job spec?

Copy code

spec:
  ttlSecondsAfterFinished: 60
...

nicholas

05/13/2020, 2:35 PM

Hi @Troy Sankey! The task library has a task that can help with this called

DeleteNamespacedJob

, and is something you can build into your flow so that it cleans up after itself with every run. You could also check that the job exists and remove it before trying to recreate. Otherwise, I think this is a problem with the KubernetesJobEnvironment that @josh is going to try to PR a fix for ASAP.

Troy Sankey

05/13/2020, 2:41 PM

cool I'll take a look at

DeleteNamespacedJob

but I should clarify that this seems to happen at a layer before my first task can even run (i.e. my code to check if the job exist would be running inside the job), so I think checking for the existence of a job would not work.

nicholas

05/13/2020, 2:44 PM

Ah that's a good point! I'll let you know when that PR gets in which should fix this by appending a run ID to the job (so there should be no collisions)

Troy Sankey

05/13/2020, 2:44 PM

thank you!

nicholas

05/13/2020, 2:45 PM

https://github.com/PrefectHQ/prefect/pull/2553 😄

josh

05/13/2020, 2:45 PM

Relevant PR https://github.com/PrefectHQ/prefect/pull/2553

josh

05/13/2020, 2:46 PM

Jinx

nicholas

05/13/2020, 2:46 PM

😂 @josh is too fast

josh

05/13/2020, 2:47 PM

@Troy Sankey also noting here that in the meantime you could also use the kubernetes agent’s resource manager to clean up completed jobs 🙂 https://docs.prefect.io/orchestration/agents/kubernetes.html#resource-manager

Troy Sankey

05/13/2020, 2:49 PM

holy crap, that was fast

Troy Sankey

05/13/2020, 2:53 PM

thanks for the pointer! I have noticed that the "first" job that the prefect agent creates for a given flow run just lingers around, and since the job spec for those are completely hard-coded I can't add like a TTL to them.

Open in Slack

Previous Next