https://prefect.io logo
Title
d

David Elliott

11/03/2022, 2:17 PM
Hey folks, bit of a niche one - what’s the intention / purpose of job_watch_timeout_seconds in the
KubernetesJob
infra? I’m finding that the agent creates the job + pod just fine (and it the flow + pod run through to completion) but after X seconds (per that timeout parameter) the agent logs
Job 'xxxxx': Job did not complete.
per this even though the job is mid-way through running? ie it doesn’t seem to have any negative effect on the flow, it’s just telling me the job didn’t complete even when the job is very much still running..? Feels like something’s not quite right, just wanting to understand what the intention is….
:kubernetes: 1
Note I’ve got
stream_output=False
set because I was finding the pod (flow) logs were being intermittently streamed back to the agent which felt dodgy, and
stream_output=false
appears to have stopped that, but now I have this error…
Think I’m better acquainted with this logic now • essentially by setting
stream_output
= false (so as not to stream the logs back to the agent) it skips the part where it follows the pod logs (presumably until the flow is completed) • in skipping that part, it jumps right to wait for job to complete section • however it appears to be exiting per the else clause after the
job_watch_timeout_seconds
. • Feels like the intended behaviour is to wait until the job returns as completed, but it appears that the
watch.stream
is not yielding anything after a short period and the for loop exits prematurely, resulting in this error log… Any thoughts on why this might happen / whether it should be fixed?
z

Zanie

11/03/2022, 4:01 PM
Sounds like it should be fixed! I’m not sure of the cause though 🙂
Sounds like an event was not emitted in
job_watch_timeout_seconds
so it was declared incomplete
It makes sense that it would have a long period where no events are emitted if it’s a long-running job
I think that timeout should be changed to let it watch forever by default
d

David Elliott

11/03/2022, 4:05 PM
☝️ yep, exactly my conclusion after running various tests. In the meantime, I’ve silenced it by setting the timeout to 0 (which I think is forever), but yeah I think that makes sense RE forever by default. Also wondering whether
stream_output
also wants to be False by default as well..? seems odd to have the agent streaming out the logs of its flows/pods..? in 1.0 I was quite used to the agent just being a creator / watcher of flow runs, rather than streaming the logs out
z

Zanie

11/03/2022, 4:08 PM
Hm. There’s not a real cost to having it default to true for Kubernetes. For some types like ECS, we need to register additional handlers so its off by default but it’s generally nice for users to get started if it shows logs from the job.
:upvote: 1
I could be convinced to default them all to false and just suggest true in our tutorials.
d

David Elliott

11/03/2022, 4:10 PM
Yeah I guess. Don’t have strong feelings on it tbh, other than it took me quite a while to find the
stream_output
setting to switch off the behaviour 🙂 it is there in the docs though so that’s on me!
j

Jarvis Stubblefield

11/08/2022, 6:46 PM
Seeing the logs were essential to my starting. Having to find a setting to turn on so I could see what was going on would have been cumbersome and could have caused me to move onto another tech for my solution. Maybe in the examples show how turning it off could be beneficial to more advanced users, but I believe for beginners it is very crucial to have that on as it helps us understand what’s happening. Just my 2 cents.
z

Zanie

11/09/2022, 5:07 PM
Thanks for the feedback!