Hello we often find ourselves having to explicitly reschedul Prefect Community #prefect-server

Hello, we often find ourselves having to explicitl...

Diego Alonso Roque Montoya

09/17/2021, 11:24 PM

Hello, we often find ourselves having to explicitly reschedule flows even when we have enough workers on our DaskKubernetes cluster. is there a common reason this happens?

Kevin Kho

09/18/2021, 12:02 AM

Hey Diego, do you have an error message?

Diego Alonso Roque Montoya

09/23/2021, 12:53 AM

It’s not an error message. the flow just refuses to continue, with tasks in Pending despite there being dask workers available

Kevin Kho

09/23/2021, 12:57 AM

Are you using a mapped tasks and how many elements does it have?

Diego Alonso Roque Montoya

09/23/2021, 2:07 AM

No mapped tasks. It’s a 200 or so node graph

Kevin Kho

09/23/2021, 2:32 AM

This commonly happens then with out of memory issues. Have you checked the pod?

Diego Alonso Roque Montoya

09/23/2021, 4:43 PM

which pod?

Kevin Kho

09/23/2021, 4:48 PM

Sorry, the Dask scheduler pod specifically (I assume it would die before the workers)

Diego Alonso Roque Montoya

09/24/2021, 4:35 AM

the dask scheduler pod is on all the time and i can send jobs to it manually, so the problem seems to be on the prefect side

Kevin Kho

09/24/2021, 2:52 PM

i think we are making changes to the prefect code for the DaskExecutor cuz there are stuff that could be more efficient, but most of the changes tend to be around mapping where repeated work is being done.

Kevin Kho

09/24/2021, 2:53 PM

I’ve only seen this behavior for mapped tasks. It would help us though if you could make a small example?

2 Views

Open in Slack

Previous Next