Hi , first - thanks for the wonderful package! i have created a basic scikit pipeline flow , and i a...

yair friedman

05/18/2022, 12:44 PM

Hi , first - thanks for the wonderful package! i have created a basic scikit pipeline flow , and i am using the DaskExecutor as the executor. i wonder if the scikit-learn algorithms are really using dask to run in a parallelised distributed manner or the wrapper task are distributed but the actual ML work in it is running locally…. is it the same as running daskML algorithms ?

✅ 1

🎉 1

Anna Geller

05/18/2022, 12:47 PM

Hi, welcome to Prefect! I can only say for sure that your function task gets submitted to a Dask cluster for execution - if there is some specific Dask integration within your package you may need to use Dask with a resource manager e.g. https://discourse.prefect.io/t/scale-your-prefect-dask-workflows-to-the-cloud-by-richard-pelgrim/375

Anna Geller

05/18/2022, 12:47 PM

or this https://discourse.prefect.io/t/how-to-use-dask-without-mapping-in-prefect-1-0-using-das[…]ker-client-to-call-client-submit-inside-a-prefect-task/470

Anna Geller

05/18/2022, 12:48 PM

maybe if you could share your flow code it might be easier to help?

yair friedman

05/18/2022, 12:51 PM

i have some variation on https://github.com/kvnkho/demos/blob/main/blogs/prefect-ml/Prefect-ML.ipynb that uses the DaskExecutor. i wonder if that is enough to distribute scikit internal computation

Kevin Kho

05/18/2022, 2:09 PM

That will distribute over Dask but it specifically is the like “compute-bound” portion of dask-ml, not the “memory-bound” where you train a model on a Dask DataFrame that is too big for one machine

yair friedman

05/18/2022, 2:50 PM

Got it, thanks!

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.