https://prefect.io logo
Title
h

Hugo Shi

07/13/2020, 5:32 PM
Hi all, I've got kind of a dumb question. If I already have a bunch of dask.delayed functions, and I'm going to pull them into a prefect, what's the best way to do that? Should I call compute on them within the prefect Task?
đź‘€ 1
k

Kyle Moon-Wright

07/13/2020, 5:41 PM
Hello @Hugo Shi, Depending on your use case, it may be prudent to write these functions into prefect tasks, construct a Prefect flow context, and execute your flow using a parallelized task executor like the DaskExecutor. This may not account for all scenarios using dask.delayed, but it should be enough to get your parallelism running initially.
h

Hugo Shi

07/13/2020, 5:44 PM
I can definitely do that - I'm more asking what the best practices are for folks that already have dask delayed (for example if I'm already doing EDA with a bunch of dask delayed functions), is the suggestion to re-cast them as prefect tasks? I can definitely keep my raw funcs separate, and turn them into delayed or prefect tasks, depending on my use case
and thanks!
k

Kyle Moon-Wright

07/13/2020, 5:48 PM
I'm not a heavy dask user, but I'm definitely interested in what others have been able to accomplish here. đź‘‚
h

Hugo Shi

07/13/2020, 5:53 PM
My thoughts are that prefect Tasks should be higher level synchronous things, and so if I"m using dask delayed, probably i should do a bunch of them in the dask, and then call compute on the results at the end. Someone telling me that I'm spot on or way off would be appreciated =)
k

Kyle Moon-Wright

07/13/2020, 6:21 PM
Thoughts @Jim Crist-Harif?
j

Jim Crist-Harif

07/13/2020, 6:24 PM
We're still working out patterns for users to make use of dask from inside of prefect. Right now I'd say that you're probably fine to use any dask stuff inside a task, provided that you don't return dask objects from the task. Prefect handles return objects in special ways (caching the object somewhere, serializing it between workers, etc...), which can do weird things with dask collections.
:thank-you: 1
You'll want to wrap your
compute
calls in
with worker_client()
before computing if you want to make use of the same cluster that prefect is running on. See https://distributed.dask.org/en/latest/task-launch.html#submit-tasks-from-worker for more info.
:thank-you: 1
h

Hugo Shi

07/13/2020, 6:56 PM
@Jim Crist-Harif thanks!
s

Sebastian

07/13/2020, 7:03 PM
Thanks for the question, this is also something I was thinking/wondering about today!
👍 1
c

Chris White

07/23/2020, 3:30 PM
@Marvin archive “Should I return Dask delayed objects from Prefect tasks?”