https://prefect.io logo
Title
j

Jie Lou

07/30/2019, 4:19 PM
Hello. My flow contains a dask-ml prediction model such as logistic regression/xgboost etc. Is it possible to use dask executor to run the flow also? It just failed for both algorithms. Or should I just use normal sklearn model inside flow? Without specifying the execute=dask_executor, the flow just works fine.
c

Chris White

07/30/2019, 6:10 PM
Hi @Jie Lou interesting question -> would you mind providing some sort of traceback for how the task failed?
j

Jie Lou

07/30/2019, 6:54 PM
Hi Chris. Thanks for replying. It did not show much info. It just said, some tasks failed
It seems that calling client = Client() inside task function makes it fail
c

Chris White

07/30/2019, 8:04 PM
Ah gotcha; tasks run on dask workers so you should use a worker client instead
j

Jie Lou

07/30/2019, 8:11 PM
Thanks! I'll try.
c

Chris White

07/30/2019, 8:12 PM
Let me know how it goes! I definitely want prefect to support ML pipelines with ease
j

Jie Lou

07/30/2019, 8:14 PM
Chris, the issue is that I am using an edited version of dask-xgboost which requires to call Client() to set host address etc. I'm not sure if it can call workers' client or not.
c

Chris White

07/30/2019, 8:16 PM
Oh interesting, I’ll have to look into that; in the meantime you can trying using a LocalExexutor
j

Jie Lou

07/30/2019, 8:17 PM
cool. Thanks for your time😊
c

Chris White

07/30/2019, 8:17 PM
Anytime!
j

Jie Lou

07/30/2019, 9:56 PM
Some updates: other algorithms under dask-ml like logistic regression/random forest works well with dask executors. Time is shortened thankfully. I think the problem only exists when I have to use that special version of dask-xgboost (to solve address issue) which has to call client. That seems not likely to be a typical problem for prefect users, more in dask-xgboost side.
c

Chris White

07/31/2019, 4:36 AM
Ah that’s good news! Thank you very much for the update @Jie Lou
@Marvin archive “Issue running dask-ml models in Prefect”
t

Taleb Zeghmi

10/25/2019, 10:17 PM
I'm looking into using Prefect as a single language/DSL to describe Airflow like ML Pipeline and sklearn like featurization pipeline. Has there been work done (or an example) to have Dask ML Pipeline be described in Flow? Thanks you!
c

Chris White

10/25/2019, 10:47 PM
Hi @Taleb Zeghmi - I know for sure that people have used Prefect for similar use cases, but unfortunately no examples have been made public yet; if you happen to get something up and running it’d be awesome if you shared it!
t

Taleb Zeghmi

10/28/2019, 6:59 PM
I worry that there's extra work to make it work together. Could you provide an example given this is a main use case and pattern for ML? thanks!
c

Chris White

10/28/2019, 7:00 PM
I personally won’t have time to put together such an example anytime soon unfortunately
Hi @Taleb Zeghmi - this is probably not exactly what you’re looking for, but here’s an example of an executable ETL workflow for “image processing” that runs on Dask + binder: https://examples.dask.org/applications/prefect-etl.html
There’s a binder link at the top of the page if you want to run it live