Hi ,
first - thanks for the wonderful package!
i have created a basic scikit pipeline flow , and i am using the DaskExecutor as the executor.
i wonder if the scikit-learn algorithms are really using dask to run in a parallelised distributed manner or the wrapper task are distributed but the actual ML work in it is running locally….
is it the same as running daskML algorithms ?
That will distribute over Dask but it specifically is the like “compute-bound” portion of dask-ml, not the “memory-bound” where you train a model on a Dask DataFrame that is too big for one machine
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.