https://prefect.io logo
Title
n

Nivi Mukka

09/30/2021, 10:37 PM
Hello Team, I have Dask Gateway setup on a GKE cluster, which is being used as the executor for a Prefect Server. Versions:
dask-gateway==0.9.0
dask==2020.12.0
distributed==2020.12.0
prefect==0.14.1
click==7.1.2
Constantly seeing this warning on Dask worker logs after a data read from BigQuery. The data read from BigQuery is happening on 15 different workers but this warning shows up only on 2-3 workers and then the Prefect Flow takes about an hour to proceed from there. Any insight into how this can be resolved?
k

Kevin Kho

09/30/2021, 10:55 PM
I think this looks like you had consecutive mapped tasks but they happened in different workers? Do you have a two stage mapping?
n

Nivi Mukka

09/30/2021, 10:59 PM
Not sure what two stage mapping is. I have a Prefect task which has 15 maps (input is list with 15 items). One task per worker. During every Flow run, there are 2-3 workers (for 2-3 of the maps) that show this warning and take up an hour or more to finish. It finishes successfully but very slow.
I also see this garbage collector INFO warning on those same workers that slow down with the “could not find data” warning.
k

Kevin Kho

09/30/2021, 11:11 PM
Is that task with a map of 15 elements preceded or followed by another mapped task?
n

Nivi Mukka

09/30/2021, 11:12 PM
Not immediately before and after but otherwise, yes.
k

Kevin Kho

09/30/2021, 11:15 PM
This is a bit too deep on the Dask scheduler that I don;’t have any immediate ideas honestly. Are you sure Dask is correctly giving one task per worker?
n

Nivi Mukka

09/30/2021, 11:17 PM
I appreciate you responding still. I have been struggling to get help with Dask related things. Did not get a response on the Dask Slack channel. Yes, one task per worker. I verified that by opening each worker’s logs on GKE.
k

Kevin Kho

09/30/2021, 11:21 PM
Would you know if it’s always the same mapped elements that are slow?
n

Nivi Mukka

09/30/2021, 11:22 PM
There is only one mapped input - a list of 15 BigQuery table names. Each table is of same size and shape. So, I think all mapped tasks are same.
k

Kevin Kho

09/30/2021, 11:33 PM
I think what you can do is create a mapped task right after this to persist something, just so you can identify which get completed and which ones hang and then you can isolate if it’s the same ones that consistently hang?
I suspect it will be and if it is, I would play with memory or something for those three cuz it seems those futures are not resolving (execution is failing). If those three succeed, I’d be wondering on more on stuff like if the Dask machine is restarting mid run
n

Nivi Mukka

09/30/2021, 11:36 PM
Thanks for those tips, will try it out!