Giovanni Giacco
05/05/2021, 9:55 AMDaskExecutor, inside a task in order to execute any workload on that Dask cluster? In our tasks we have to deal with huge pandas dataframe and we’d like to use the same Dask cluster to parallelize our computation. Any tip?Avi A
05/05/2021, 10:30 AMdistributed.worker_client to fetch a client for the cluster your task is running on. Here’s a redacted piece of code from my task. The important part is passing the client to dask_df.compute
with worker_client() as client:
    dask_df = dd.from_pandas(df, chunksize=DEFAULT_CHUNKSIZE)
    logger.debug('Universe size: %d', len(df))
    dask_df['new_col'] = dask_df.apply(some_func, axis=1)
    return dask_df.compute(scheduler=client)Giovanni Giacco
05/05/2021, 10:38 AMworker_client() ?Avi A
05/05/2021, 10:46 AMworker_client is a Dask function
from distributed import worker_clientAvi A
05/05/2021, 10:47 AMGiovanni Giacco
05/05/2021, 10:48 AMAvi A
05/05/2021, 1:08 PMKevin Kho
DaskExecutor and have that subflow run. You might also find this Doc helpful.Giovanni Giacco
05/10/2021, 11:29 PM