Jerry Thomas
09/20/2019, 5:06 AMZachary Hughes
09/20/2019, 6:53 PMdask-kubernetes
, but we'll get that documented ASAP. In the meantime, this page might be useful if you haven't already found it: https://docs.prefect.io/core/tutorials/dask-cluster.html
In this case, the workers just need access to your code's dependencies, not the actual code itself. So you'll need to install Prefect at a minimum, but likely more depending on your code's dependencies.
As for how the code gets executed, Prefect takes care of submitting your serialized code to the Dask scheduler. As long as your code's dependencies are there, it should be able to be deserialized and you're off to the races.Joe Schmid
09/20/2019, 8:16 PMfrom prefect.engine.executors import DaskExecutor
executor = DaskExecutor(address="<tcp://dask-scheduler:8786>")
flow.run(executor=executor)
Where the dask-scheduler
hostname will resolve in DNS on your Dask workers. (With dask-kubernetes that should be the default.)dask-scheduler
hostname.)Jerry Thomas
09/23/2019, 5:17 AMdask-worker
pod it is the same as running the flow using a local dask cluster.
Am I correct in understanding that if I wanted to run a streaming pipeline with flow, I should start a dask-kubernetes
cluster and add a separate pod with my custom application that initiates the flows. These flows will then be pushed to the dask-workers
using the dask-scheduler
in the cluster.