I've got a kubernetes scheduling question I'd like...
# prefect-kubernetes
r
I've got a kubernetes scheduling question I'd like to check, to see if I've got this right. In my test, I specified a fairly large limit, so on my cluster I can allocate 4 jobs to run at one time. If I submit more than four, say, 16, then 4 start running, and 12 get marked as 'crashed'. Kubernetes (at least in the rancher view I see) mark the 12 jobs as 'pending' with a message like "0/9 nodes are available: 1 Insufficient memory,..." as you'd expect. Then, when the 4 running jobs complete, some of those 'crashed' jobs change state to running. Everything goes ok, it's just that the red 'crashed' messages look a little alarming in prefect UI. I'd sort of expected to see 'pending' rather than crashed. I just wanted to check that this was expected behaviour or not.
y
i have the same issue
k
What is your
Pod Watch Timeout Seconds
set to on your work pool? The default value is
60
, but if you have a decent idea of how long is too long for a run to be in a pending state before you think something has gone wrong, feel free to increase it to that number. I suspect that's the cause of what you're seeing.
y
i tried to increase mine to 120. i dun face the issue of crashed and then running anymore
r
Mine is not set, so presumably is the default 60. I'll try increasing it like @Ying Ting Loo