Hi how do i debug async flow randomly hangs in kub...
# prefect-kubernetes
y
Hi how do i debug async flow randomly hangs in kubernetes worker- job pod? I have a async task in the flow where i am downloading files from s3 async say 1000 of them. For the first 20 of them the task would complete successfully and the rest of the task would just continue running forever until error message
Crash detected! Execution was cancelled by the runtime environment.
turns up. Even then it would still continue to be in the running state, then finally it crashes with httpcore.pool timeout When i was running this in prefect agent, i dont have this issue. Any idea where do i start looking?
r
I would first look at the pod, is it running out of ram/cpu and oom'ing? kubectl describe pod <pod-name>
y
nope shouldnt be i allocated quite alot of resources, like 10gb of ram. and the pod is longer to be found once it crashes
r
should be able to find it with kubectl get pods -A
y
no access to other namespace
r
kubectl get pods -o wide
"get pods in current namespace"
y
yeah its no longer there because the job crashed and consider completed
r
you need some inject hackery then on the job pod - https://stackoverflow.com/a/40093356/6440
to keep it around so you can inspect it
y
@redsquare any idea how do i do it in prefect job pods? since we dont directly control the pod with a manifest file but it is triggered launch/build by the prefect worker?
r
i would just copy the pod yaml when its up and tweak it to add the sleep then manually create a pod with it