https://prefect.io logo
Title
t

Tom Klein

04/13/2022, 3:31 PM
Hello again 😄 regarding the
RunNamespacedJob
example (from https://github.com/anna-geller/packaging-prefect-flows/blob/master/flows_task_library/s3_kubernetes_run_RunNamespacedJob_and_get_logs.py ) --- we implemented it and got it to work, but it seems that it’s now failing on :
VALIDATIONFAIL signal raised: VALIDATIONFAIL('More than one dummy pod')
because there seems to be many pod “resiudes” of previous runs:
['prefect-agent-7745fb9694-6fwk4', 'prefect-job-47d072a8-4pbsf', 'seg-pred-test-cm54l', 'seg-pred-test-doron', 'seg-pred-test-l2j5l', 'seg-pred-test-zvwld']
so wouldn’t k8s keep the pods around given that we gave a “delete_job_after_completion” = False ? and even if the job is deleted successfully, wouldn’t it keep the pods around? or are the pods supposed to be deleted automatically if the job is deleted…?
k

Kevin Kho

04/13/2022, 3:48 PM
I think this is a question for Anna but she’s away from the computer so we’ll wait for her
a

Anna Geller

04/13/2022, 6:01 PM
tag me next time. Kevin 😛 looking now
@Tom Klein
delete_job_after_completion
is for the Kubernetes job, not for a pod. A single job can result in many pods afaik In general, it's all configurable, you need to dig deeper into those Kubernetes tasks, Tom 🙂 but happy to help you if you have trouble understanding those. Can you share the flow code for those Kubernetes tasks that seem confusing to you?
I believe only the seg-pred-test-* pods are those spun up by those Kubernetes tasks. • prefect-agent-7745fb9694-6fwk4 is for the agent itself • prefect-job-47d072a8-4pbsf is the flow run pod • the rest are those from your Kubernetes-task-Kubernetes-jobs I think the pods are not deleted - that's why in the example you mentioned I used the task
DeleteNamespacedPod
to clean those up What's the end goal you try to achieve here? Do you want to keep those pods or delete those?
t

Tom Klein

04/13/2022, 9:35 PM
Our goal is to basically launch a job and then clean up after it but bring back the logs into the prefect UI so they're more easily visible to our data scientists We're not interested in keeping the pods around other than to extract the logs from them. But for whatever reason if we ever run into a situation where there's pods remaining, then : 1. The condition will fail 2. There no easy way to clean it up from within prefect anymore nor to know which pod belongs to the job we just created (right?)
a

Anna Geller

04/14/2022, 12:30 AM
Would you be able to share an example I could reproduce on my end? It's quite likely that just some tiny piece in your tasks configuration is missing. The use case you described is 100% doable with the existing Kubernetes tasks
t

Tom Klein

04/14/2022, 10:58 AM
@Anna Geller hmm, i can definitely share what we have now but since we constantly change the code (and since the current state of the k8s cluster is the result of multiple runs, some of them failed [and some due to unrelated reasons] ) i’m not sure the exact situation is easily reproducible just from code -- basically if you run your own (example) code multiple times but omit the last step of
delete
, you will find yourself in the current state we are in, right? where you have multiple pods that have not been deleted, and the validation (that checks if there are other pods that start with the same prefix) would fail if you try to run it again … right? i’m basically wondering what’s the right way to cope with this situation - we can’t just omit the validation (since then the logic of returning the “first” pod that is found would make no sense, no one guarantees it’s necessarily the one that was just now created) - and i couldn’t really find other ways to get the pod-name for the job that was just created (it’s not returned by the
ReadNamespacedJob
task, for example, nor by the
RunNamespacedJob
itself)
a

Anna Geller

04/14/2022, 11:03 AM
Ok, so the actual problem you see is that when
RunNamespacedJob
fails, the
DeleteNamespacedJob
doesn't run, leaving a zombie pod undeleted, correct?
t

Tom Klein

04/14/2022, 11:06 AM
there is no
ReadNamespacedJob
(in your example), there’s a
ListNamespacedPod
to list all the pods and then filter for the ones that have a name starting with the name of our job -- we tried the
ReadNamespacedJob
as an alternative way of maybe getting the pod name directly… 😕 but yes - what you wrote is correct and seems to be what actually happened - for whatever reason (doesn’t even really matter why) - there was some initial zombie pod, after which the process didn’t stop generating more of them (cause each time - even though it begins with a “delete_if_exists” for the job - that doesn’t actually remove the pod. I’m not a k8s expert but it seems to be possible for the job to not exist anymore even though the pod does. - and when the validation fails, the pod that was just created for the newly created job --- still exists
or in other words: • a single “zombie” pod exists, no job to delete -> • try to delete job (via
DeleteNamespacedJob
) fail since it doesn’t exist • create and run new job • list pods • fail validation since now there are two pods with that name • end delete step (of
DeleteNamespacedPod
) is not reached so a new zombie is created --- and even if it was reached, it would only take care of this current run, not of the other zombie --- but it can’t even do that since it’s unclear which of the two pods is “our” pod (this run’s pod, that is) and now the process begins again with 2 zombies instead of 1, and so on.. maybe we need like an initial cleanup step that also tries to delete all zombie pods with that name? my worry is that it’s kind of a low-level fiddling with other runs… what if someone legitimately ran this flow more than once in parallel? all i want is to have a single job act as an “atomic” operation (that cleans only after itself) — i don’t mind if there are multiple such jobs running simultaneously etc.
a

Anna Geller

04/14/2022, 11:11 AM
this was a typo, I meant
RunNamespacedJob
1
what you wrote is correct and seems to be what actually happened
you can solve it using triggers - adding
trigger=all_finished
, should ensure that the pod will get deleted even if
RunNamespacedJob
fails - add the same line to the delete task
t

Tom Klein

04/14/2022, 11:13 AM
but if we get to that line and there were two pods with that name to begin with, we wouldn’t know which one to delete, right?
a

Anna Geller

04/14/2022, 11:14 AM
then delete_if_exists? 😄
t

Tom Klein

04/14/2022, 11:14 AM
that step only deletes the job, not the pod
the point is that you have to know the specific name of the pod in order to delete it, and you can’t know which is “your” pod
a

Anna Geller

04/14/2022, 11:15 AM
I meant - using the same logic - first delete if exists, then try to run the pod
t

Tom Klein

04/14/2022, 11:16 AM
right but you wouldn’t know which pod to try and delete. let’s say there’s:
seg-pred-123
and
seg-pred-456
as pods when you start to run. which one do you delete?
and in fact, i don’t want to delete existing jobs/pods. maybe someone is legitimately running a similar job in parallel? i just want this run to clean after itself, i don’t care about other runs
basically what we’re really missing is a way to identify the pod of the job we just created via
RunNamespacedJob
- without relying on there being only a single pod with that name prefix. in
kubectl
this is achieved with
describe job
apparently, or something. doesn’t seem possible via Prefect (or maybe it is and i’m missing something. that’s what i’m asking)
a

Anna Geller

04/14/2022, 11:30 AM
this seems to be a cluster administration problem - perhaps you can have a separate job deleting pods that start with a given name? I can totally understand why it's beneficial to clean those up but I don't have a clear recipe other than this code example and using triggers I would recommend using triggers to solve this problem. This way, this use case with 2 pods with the same name shouldn't happen in the first place:
if we get to that line and there were two pods with that name to begin with, we wouldn’t know which one to delete, right?
trigger
all_finished
is the most reliable and cleanest approach I can recommend at this time
t

Tom Klein

04/14/2022, 11:32 AM
all_finished
will make sure the step runs, but if there’s more than one pod with that name we wouldn’t know which of the two (or three, or four) is “our” pod (the one that was generated by this specific prefect flow run) we definitely want to allow for more than one instance of the job to run in parallel. maybe the solution is to give the job a unique name per run… ? (that way there’s a 1:1 relation between “jobs with that name” and “corresponding pods” ) the case of multiple pods can definitely happen if more than one person runs an instance of this flow
a

Anna Geller

04/14/2022, 11:34 AM
if there’s more than one pod with that name
why would it be?
t

Tom Klein

04/14/2022, 11:34 AM
for example if there was a scheduled run of the flow, and someone else wanted to also run a manual run (with different parameters) of this flow — while the scheduled run was still running
a

Anna Geller

04/14/2022, 11:35 AM
you could set a concurrency limit of 1 to avoid that if this is an actual problem
t

Tom Klein

04/14/2022, 11:35 AM
but we don’t want to limit. we want it to be possible to run this multiple times in parallel
i think we’re kind walking around the problem here… the problem isn’t solved by limiting concurrency (especially when such concurrency is needed), but by being able to identify 1:1 the pod that was just created by the job we just ran. i guess giving the job a unique name (e.g.
my-cool-job-467fdfg5a
) solves that problem (since there would only ever be one pod that matches that unique name), i just don’t know if that’s a “best practice”. Seems to me like it’s more of a workaround.
a

Anna Geller

04/14/2022, 11:38 AM
gotcha. I don't think that giving unique names to pods violates any best practices. And even if, if this solves your problem, go for it 🙂
t

Tom Klein

04/14/2022, 11:41 AM
the pods have unique names by default i think, it’s the
job
that i don’t know if we want to give a unique name to… my knowledge of k8s is too limited to know if there’s any important reasons to want jobs of a similar “nature” (i.e. image, code, whatever) to have the same name… i’ll ask our devops anyway, it all would have been solved if we somehow could just get the name of the pod related to the job we just created via
RunNamespacedJob
— i tried to use the
Read
task and the only thing that looked like an identifier was the
controller-uid
or something - but i dunno if that should/could be used as an indirect identifier for the pod
{'api_version': 'batch/v1',
 'kind': 'Job',
 'metadata': {'annotations': None,
              'cluster_name': None,
              'creation_timestamp': datetime.datetime(2022, 4, 13, 16, 1, 56, tzinfo=tzlocal()),
              'deletion_grace_period_seconds': None,
              'deletion_timestamp': None,
              'finalizers': None,
              'generate_name': None,
              'generation': None,
              'labels': {'controller-uid': '35ca5ffe-8583-42bb-8c98-ec1d413bf7cc',
                         'job-name': 'seg-pred-test'},
              'managed_fields': [{'api_version': 'batch/v1',
                                  'fields_type': 'FieldsV1',
alternatively, we would need to give up our desire to “pull back” the logs into Prefect UI and just rely on our own internal logging mechanism (and then we can set the job to delete resources after its finished).. this is what i’m trying right now:
#del_job = delete_if_exists()
    k8s_job = create_and_run_job()
    #del_job.set_downstream(k8s_job)

    v1job = read_job()
    k8s_job.set_downstream(v1job)
    print_job_output = print_job(v1job)
    
    controller_uid = v1job['metadata']['labels']['controller-uid']
    pods = list_pods(kube_kwargs={"label_selector": {"controller-uid": controller_uid}})
    list_of_pods = get_pod_ids(pods)
    pod_name = get_our_pod_name(pods)
    delete_pod(pod_name)
a

Anna Geller

04/14/2022, 12:04 PM
Thanks for explaining the problem more. This is something that would need to be investigated in more detail, and perhaps some engineers from the Integrations team can chime in to help (maybe there's some tweak that can be added to those tasks to get the info you need). I'll be OOO for Easter so I can't dive deeper now, but let me open a Github issue for now. Feel free to continue adding any notes or continue the discussion on the GitHub issue. @Marvin open "How to get the name of the pod related to the job created via RunNamespacedJob?"
🙏 1
1
t

Tom Klein

04/14/2022, 12:06 PM
Thanks @Anna Geller - i’m also going on vacation for passover 😄
👍 1