Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

Hey folks, bit of a niche one - what’s the intention / purpose of <https://github.com/PrefectHQ/prefect/blob/main/src/prefect/infrastructure/kubernetes.py#L60|job_watch_timeout_seconds> in the `KubernetesJob` infra?

I’m finding that the agent creates the job + pod just fine (and it the flow + pod run through to completion) but after X seconds (per that timeout parameter) the agent logs `Job 'xxxxx': Job did not complete.` <https://github.com/PrefectHQ/prefect/blob/main/src/prefect/infrastructure/kubernetes.py#L485|per this >even though the job is mid-way through running? ie it doesn’t seem to have any negative effect on the flow, it’s just telling me the job didn’t complete even when the job is very much still running..? Feels like something’s not quite right, just wanting to understand what the intention is….

Note I’ve got `stream_output=False` set because I was finding the pod (flow) logs were being intermittently streamed back to the agent which felt dodgy, and `stream_output=false` appears to have stopped that, but now I have this error…

Think I’m better acquainted with <https://github.com/PrefectHQ/prefect/blob/main/src/prefect/infrastructure/kubernetes.py#L451-L497|this logic> now
• essentially by setting `stream_output` = false (so as not to stream the logs back to the agent) it skips the part where <https://github.com/PrefectHQ/prefect/blob/main/src/prefect/infrastructure/kubernetes.py#L460-L469|it follows the pod logs> (presumably until the flow is completed)
• in skipping that part, it jumps right to <https://github.com/PrefectHQ/prefect/blob/main/src/prefect/infrastructure/kubernetes.py#L471-L486|wait for job to complete> section
• however it appears to be <https://github.com/PrefectHQ/prefect/blob/main/src/prefect/infrastructure/kubernetes.py#L484-L486|exiting per the else clause> after the `job_watch_timeout_seconds` . 
• Feels like the intended behaviour is to wait until the job returns as completed, but it appears that the `watch.stream` is not yielding anything after a short period and the for loop exits prematurely, resulting in this error log…
Any thoughts on why this might happen / whether it should be fixed?

Sounds like it should be fixed! I’m not sure of the cause though :slightly_smiling_face:

Sounds like an event was not emitted in `job_watch_timeout_seconds` so it was declared incomplete

It makes sense that it would have a long period where no events are emitted if it’s a long-running job

I think that timeout should be changed to let it watch forever by default

:point_up: yep, exactly my conclusion after running various tests. In the meantime, I’ve silenced it by setting the timeout to 0 (which I think is forever), but yeah I think that makes sense RE forever by default.

Also wondering whether `stream_output` also wants to be False by default as well..? seems odd to have the agent streaming out the logs of its flows/pods..? in 1.0 I was quite used to the agent just being a creator / watcher of flow runs, rather than streaming the logs out

Hm. There’s not a real cost to having it default to true for Kubernetes. For some types like ECS, we need to register additional handlers so its off by default but it’s generally nice for users to get started if it shows logs from the job.

I could be convinced to default them all to false and just suggest true in our tutorials.

Yeah I guess. Don’t have strong feelings on it tbh, other than it took me quite a while to find the `stream_output` setting to switch off the behaviour :slightly_smiling_face: it is <https://docs.prefect.io/concepts/infrastructure/#kubernetesjob|there in the docs> though so that’s on me!

Seeing the logs were essential to my starting. Having to find a setting to turn on so I could see what was going on would have been cumbersome and could have caused me to move onto another tech for my solution. Maybe in the examples show how turning it off could be beneficial to more advanced users, but I believe for beginners it is very crucial to have that on as it helps us understand what’s happening. Just my 2 cents.