I ve got a kubernetes scheduling question I d like to check Prefect Community #prefect-kubernetes

I've got a kubernetes scheduling question I'd like...

Robert Denham

11/03/2023, 6:28 AM

I've got a kubernetes scheduling question I'd like to check, to see if I've got this right. In my test, I specified a fairly large limit, so on my cluster I can allocate 4 jobs to run at one time. If I submit more than four, say, 16, then 4 start running, and 12 get marked as 'crashed'. Kubernetes (at least in the rancher view I see) mark the 12 jobs as 'pending' with a message like "0/9 nodes are available: 1 Insufficient memory,..." as you'd expect. Then, when the 4 running jobs complete, some of those 'crashed' jobs change state to running. Everything goes ok, it's just that the red 'crashed' messages look a little alarming in prefect UI. I'd sort of expected to see 'pending' rather than crashed. I just wanted to check that this was expected behaviour or not.

Ying Ting Loo

11/03/2023, 8:33 AM

✋ i have the same issue

Kevin Grismore

11/03/2023, 2:16 PM

What is your

Pod Watch Timeout Seconds

set to on your work pool? The default value is

, but if you have a decent idea of how long is too long for a run to be in a pending state before you think something has gone wrong, feel free to increase it to that number. I suspect that's the cause of what you're seeing.

Ying Ting Loo

11/06/2023, 3:41 AM

i tried to increase mine to 120. i dun face the issue of crashed and then running anymore

Robert Denham

11/06/2023, 5:09 AM

Mine is not set, so presumably is the default 60. I'll try increasing it like @Ying Ting Loo

4 Views

Open in Slack

Previous Next