Hi, I'm running a mapped task over ~400 elements on Kubernetes using DaskExecutor + KubeCluster, but I quickly run out of memory. The data I'm using is <5GB and the nodes I'm using have ~60GB of RAM. The job pod (running the Dask scheduler) reaches >40GB memory usage just before the mapped task starts and the node runs out of memory before any of the mapped tasks start. I was wondering if anyone knows what the issue is. Thank you
m
Matthew Alhonte
02/17/2021, 7:29 PM
Sometimes Dask can eat a bunch of memory for writing the tasks if you don't give it an explicit limitation - try adding an arg like memory_limit='5GB' or something?
Matthew Alhonte
02/17/2021, 7:29 PM
To the executor when you create it.
Matthew Alhonte
02/17/2021, 7:50 PM
@Nikul
n
Nikul
02/17/2021, 7:56 PM
I've given an explicit limit for the dask workers. Is there an option to specify limits/requests for individual tasks? Also, I'm not sure if there's an option to give a limit to the Prefect job pod (in any case this limit would be breached). I'm more concerned about why so much memory is being used in the first place.
@Matthew Alhonte
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.