https://prefect.io logo
d

Diego Alonso Roque Montoya

09/17/2021, 11:24 PM
Hello, we often find ourselves having to explicitly reschedule flows even when we have enough workers on our DaskKubernetes cluster. is there a common reason this happens?
k

Kevin Kho

09/18/2021, 12:02 AM
Hey Diego, do you have an error message?
d

Diego Alonso Roque Montoya

09/23/2021, 12:53 AM
It’s not an error message. the flow just refuses to continue, with tasks in Pending despite there being dask workers available
k

Kevin Kho

09/23/2021, 12:57 AM
Are you using a mapped tasks and how many elements does it have?
d

Diego Alonso Roque Montoya

09/23/2021, 2:07 AM
No mapped tasks. It’s a 200 or so node graph
k

Kevin Kho

09/23/2021, 2:32 AM
This commonly happens then with out of memory issues. Have you checked the pod?
d

Diego Alonso Roque Montoya

09/23/2021, 4:43 PM
which pod?
k

Kevin Kho

09/23/2021, 4:48 PM
Sorry, the Dask scheduler pod specifically (I assume it would die before the workers)
d

Diego Alonso Roque Montoya

09/24/2021, 4:35 AM
the dask scheduler pod is on all the time and i can send jobs to it manually, so the problem seems to be on the prefect side
k

Kevin Kho

09/24/2021, 2:52 PM
i think we are making changes to the prefect code for the DaskExecutor cuz there are stuff that could be more efficient, but most of the changes tend to be around mapping where repeated work is being done.
I’ve only seen this behavior for mapped tasks. It would help us though if you could make a small example?
2 Views