Hi , first - thanks for the wonderful package! i h...
# prefect-community
y
Hi , first - thanks for the wonderful package! i have created a basic scikit pipeline flow , and i am using the DaskExecutor as the executor. i wonder if the scikit-learn algorithms are really using dask to run in a parallelised distributed manner or the wrapper task are distributed but the actual ML work in it is running locally…. is it the same as running daskML algorithms ?
1
🎉 1
a
Hi, welcome to Prefect! I can only say for sure that your function task gets submitted to a Dask cluster for execution - if there is some specific Dask integration within your package you may need to use Dask with a resource manager e.g. https://discourse.prefect.io/t/scale-your-prefect-dask-workflows-to-the-cloud-by-richard-pelgrim/375
maybe if you could share your flow code it might be easier to help?
y
i have some variation on https://github.com/kvnkho/demos/blob/main/blogs/prefect-ml/Prefect-ML.ipynb that uses the DaskExecutor. i wonder if that is enough to distribute scikit internal computation
k
That will distribute over Dask but it specifically is the like “compute-bound” portion of dask-ml, not the “memory-bound” where you train a model on a Dask DataFrame that is too big for one machine
y
Got it, thanks!