Eddie Atkinson

04/12/2022, 2:35 AM
This is a really silly question, but I can’t quite clue out the answer from Dask’s and Prefect’s documentation. My aim is to use Prefect to orchestrate flows with varying memory requirements using a Dask cluster. As an MVP I’ve set up an
flow using the
with 30GB of RAM. For large jobs this flow OOMs and gets killed by the Prefect scheduler. My question is this: If I set up a Dask cluster to run these jobs would it gracefully handle memory issues? That is to say if I had 30GB of RAM in the cluster and a job that required 50GB would Dask OOM or would it simply run slower? Do I need to modify my code to use Dask dataframes or is there some smarts here I’m not quite across?
:discourse: 1

Kevin Kho

04/12/2022, 2:38 AM
It would OOM. Dask does have memory spillover but I think the default is that 30% can be shuffled to disk. You memory requirements would like a lot more. It is also not performant once you hit this. So you really need to bump up resources. What you can do though is parameterize the size of the Dask cluster. See this
Not a silly question btw 🙂

Eddie Atkinson

04/12/2022, 3:34 AM
The parameterisation is really cool