is there a default limit for concurrent flows or something in the Cloud?
we have currently a flow that was run ad-hoc and is still running, and a flow that was scheduled to run now, and it’s stuck in
and nothing seems to be happening with it
it seems to be related to a
we added to our Kubernetes namespace (without the Quota it works fine):
for some reason it refuses to run even if the limits far exceed that we “need”… any experience or idea?
07/25/2022, 10:28 PM
Hey @Tom Klein were you able to sort this out? ultimately I don't think I have enough knowledge around Kubernetes to offer too much guidance here but I would say as far as the Prefect Agent goes once it submits the job the runtime environment has more control over whats occurring than Prefect does, especially in this case with the resource quota, from the prefect side if you aren't already you could try specifying the resource requests in the KubernetesRunConfig but otherwise I think Kubernetes support may be able to offer more insight into what could be occurring.
07/26/2022, 7:49 AM
@Mason Menges sure, i understand that it’s not necessarily an issue that’s stemming directly from prefect,
and - yes - i am already specifying the resource request in my run-config
it didn’t seem to matter (e.g. i requested 1gig memory and 1 cpu when the quota was like 50gig memory and 10 cpu) and (apparently) it just blocked the job from submitting
i think the issue i have with prefect here is that (according to K8s specs at least) if you try to provision a job in k8s that doesn’t meet the quota, you’re supposed to get blocked (i.e. get an error).
there was no such error here, just spinning forever until it gets picked up by lazarus once, twice and finally a third time
i would expect to see some error message if the issue was the quota, but then again, it works perfectly without the quota in place…
my devops is set to work on this again tomorrow so i’ll report here if we figure something out.
@Mason Menges just to close up this open thread --- indeed the issue was that - once quotas are set in k8s, you must set CPU and memory requirements explicitly in the job (apparently, not just the requests but also the limits) - or the flow creation process would just stall for a long time until it is eventually picked up by lazarus (and then repeats like 2 or 3 more times)
i’m still not sure i understand why prefect stalls in this case and doesn’t fail and report an error --- but i also don’t know if the problem’s root is in prefect or k8s to begin with
at least it’s solved