Hey all, has anybody successfully started a dask cluster + prefect on top of an existing spark cluster (databricks)? I have a databricks cluster but want to use distributed prefect to schedule
a
Anna Geller
05/06/2022, 6:23 PM
to work with Databricks, the easiest is to use https://docs.prefect.io/api/latest/tasks/databricks.html
Running Dask on Databricks is 💣 😄 even if it works, it would be potentially dangerous. Probably easier to stick with Spark if you rely on Databricks
I saw a blog post on Medium showing how someone could in theory run Dask on Databricks but not sure if this is worth your time
Can you perhaps explain your use case more? what type of work do you try to parallelize?
e
Evan Curtin
05/06/2022, 6:24 PM
It’s large scale ML experiments I want to run. 1 simulation per worker. Databricks is just the easiest way for me to get a cluster. I’m currently using spark to do the orchestration but it’s really not meant for what I’m doing
👍 1
k
Kevin Kho
05/06/2022, 6:32 PM
Yes dude! I was trying this out with my collaborator for my other open source project. This is the init script:
we’re using the docker runtime so installing dependencies is trivial. So if we just need a way to start the workers and scheduler and setup communication it feels doable
Evan Curtin
05/06/2022, 8:09 PM
it doesn’t need to happen at cluster init either
Evan Curtin
05/06/2022, 8:10 PM
Copy code
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
dask-scheduler &>/dev/null &
else
dask-worker tcp://$DB_DRIVER_IP:8786 --nworkers 4 --nthreads 1 &>/dev/null &
fi
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.