Chris Hart

    Chris Hart

    2 years ago
    looking to use prefect with dask-ml on the default DaskExecutor.. how might I get the sklearn bits to use dask as the joblib backend as referenced here: https://ml.dask.org/joblib.html ?
    or any docs about generally hooking into the underlying dask cluster from prefect tasks
    (rn I'm just flying blind and as a first step would attempt to hook them both up to the same cluster as shown by https://docs.prefect.io/core/advanced_tutorials/dask-cluster.html) 🤞
    Jim Crist-Harif

    Jim Crist-Harif

    2 years ago
    Hi Chris, you should be able to access the underlying dask cluster in a task by using
    dask.distributed.worker_client
    . See the dask docs here: https://distributed.dask.org/en/latest/task-launch.html#connection-with-context-manager
    However, if you're just trying to use the joblib backend and not dask directly, you should be able to follow the joblib example you linked directly - the underlying joblib backend sets up a client on the worker for you.
    e.g. the following should work in a prefect task without needing to create a client yourself:
    with joblib.parallel_backend('dask'):
        search.fit(digits.data, digits.target)
    Chris Hart

    Chris Hart

    2 years ago
    woohoo thanks for the pointers!
    Jim Crist-Harif

    Jim Crist-Harif

    2 years ago
    No problem, happy to help
    m

    Mark Koob

    2 years ago
    For anyone showing up here from the future - using
    distributed.worker_client()
    does give the correct behavior when launching a
    dask-ml
    search from inside a task when running on a dask cluster!
    @task
    def search_model(searchcv, x, y):
        with worker_client() as client:
            return searchcv.fit(x, y)
    Thanks Jim!