#RubberDucking So, we're using Papermill for a lot of stuff. I think the code that runs in Papermill isn't actually being distributed with the Dask scheduler, and is basically just being treated as one function call. I guess the way to have the Papermill code get Dask-ified would be:
Prefect flow starts in a lightweight container
Prefect flow spins up a Dask cluster
Modify Papermill notebooks so that they take an Address of a Dask cluster as an argument, and then submit the code that runs in them to the Cluster
That sound right?
11 months ago
I think so, but you may run into problems if the code in the notebooks uses some ind of parallelization itself I think.
Not sure if that's how it should work for flows with actual Params though, but we'll see!
Using a regular Dask
object connected, but the Papermill task itself blocked execution of the tasks I submitted from within it. There's probably a way to do it, buuut in the meantime, I found defining a little Prefect flow within the Notebook and telling it to execute on the Cluster wound up working!
So like having a cell in the notebook with: