Marko Jamedzija07/20/2021, 3:33 PM
state even though the underlying k8s
task completed successfully. I’m using prefect
. This happens almost always for the longer-running tasks of this kind. Any suggestion how to resolve this? Thanks!
, I’ve seen some people get around this by using processes instead of threads. Are you using processes already?
Marko Jamedzija07/20/2021, 3:40 PM
. I tried using
and indeed this didn’t happen, but still evaluating how much is it gonna affect the resource usage. What is your advice here for setting
more then number of cores?
tasks, which should be pretty lightweight
is more than
with processes because the resources are already exhausted. If the
is lightweight, the worker should just pick up the next task once that is kicked off.
Marko Jamedzija07/20/2021, 3:48 PM
. It did run 4
tasks in parallel successfully, but again got stuck in running the longest one. Do you have any other suggestions how to deal w/ this? Thanks!
Marko Jamedzija07/20/2021, 4:15 PM
so that they are 1:1?
Marko Jamedzija07/20/2021, 4:21 PM
Marko Jamedzija07/20/2021, 4:36 PM
reducingThis works. I reduced to 2 workers and it’s working. However, I still think this is an issue that needs fixing. I’ll inspect tomorrow more the resource usage in the cluster to be sure, but from what I saw so far it shouldn’t have been the reason for this behaviour 🙂so that they are 1:1
Marko Jamedzija07/21/2021, 2:42 PM
is just used to stop the job if it “outgrows” this resource requirement, and the
will use the pod available cpu count (which is independent of this value) to create the number of processes (unless it’s overridden w/
). From what I managed to conclude it’s the number of cores of underlying node. So for now I will just use nodes with higher CPU count to achieve higher parallelism, but the problem of
> CPUs remains 🙂 Thanks for the help in any case!