Hey guys! I'm new to Prefect as I'm trying new too...
# ask-community
a
Hey guys! I'm new to Prefect as I'm trying new tools. How does Prefect compare to Dagster? Also, would MLFlow complement the use of Prefect?
k
Hey Jean, I’ll speak to the MLFlow section of this. I used MLFlow as a way to track experiment results on Databricks. The relationship would be wrapping that MLFlow writing as a Task and monitoring the success of writing out to MLFlow. MLFlow is also good for storing artifacts from model runs (plots, metrics).
k
I’ll add this article to your discovery gathering - it goes through the tools you mentioned in a pretty holistic way (since I may be biased to prefect duck).
a
@Kevin Kho thanks! And does it work alongside prefect?
@Kyle Moon-Wright I'm gonna take a look at that Article again! I'm still on the fence with Prefect and Dagster. One think I'm sure is that I would like to use MLFlow for experiment monitoring and hyperparam opt
🧪 3
k
I haven’t tried it, but they should work together without a problem because Prefect works with Python code and MLFlow has a Python API
upvote 1
d
Dagster and Prefect are actually quite different - Dagster’s focus appears to be on tracking data assets and providing things like type systems, etc.  They are very interested in what your tasks are doing.  This means that writing a Dagster pipeline requires thinking through config, and restructuring your code to fit their models. Prefect, on the other hand, is focused on the orchestration of your tasks and pipelines — things like scheduling, recovery from failure, running workflows in heterogeneous environments.  This means that writing Prefect pipelines requires only a light touch to your existing code.
I will say that it sounds like we can definitely handle your usecase 😄
a
Hey @Dylan thank you for the prompt response. Prefect does seem more straightforward. I'm def leaning more towards it, just gotta see how my pieline is gonna work with Prefect + MLFLlow
d
Let us know how we can help! 😄
a
What will I be missing from Prefect by not taking advantage of Prefect-Cloud other than the cloud part itself? Are there any features not in the open sourced part? I'm a bit confused on that
k
Hey @Amber Papillon, Take a look at the differences between Prefect Server and Prefect Cloud here. There are a few more differences between the two backend APIs that I can link as well (they seem to have moved…).
a
@Kyle Moon-Wright Thank you! Yes, I would appreciate it if you could. I didn't really understand the section "Scale & Performance" it's not very clear on the reason for that difference.
k
Hmm, it looks like the page I was referring to was removed for the time being as we are transitioning to success-based pricing, so I’ll need to check internally with the team for it’s whereabouts tomorrow. It was really just a feature list and doesn’t address your question ^. Did you have a question on the “Scale & Performance” section that I can clarify? Prefect Server was not intended for production purposes, but rather as a backend API for single developers/hobbyists to get started orchestrating with Prefect. Going the Server route, you are in charge of maintaining the networking between services and health of your containers - while Cloud was built with a highly-available and responsive API meant for production-level flows, taking care of the backend entirely so you can focus on your business critical work.
If Server meets your needs, that’s great and you should go for it! However when used at scale, it will not be representative of your experience using Cloud.
a
I do believe the Server will meet my needs! Thank you for the info!
MLFlow Tracking has some nice features that seem to be lacking in Prefect (a record of the current git commit, ML metrics, etc.) They also integrate nicely with Databricks that allows for easy cluster deployment. However, they really lack in building complicated pipelines, in which Prefect excels. Is there a way to get "the best of all worlds"? That is, integrate Prefect with MLFlow?
k
Do you plan to use Spark and Databricks clusters?
a
I plan to use Spark/Vaex and occasionally Dask but that's unlikely. Databricks will be much later on.
k
Ok so MLFlow has a couple of features (Tracking, Deployment, Model Registry). I personally only have experience with Tracking. Tracking works by having a snippet of code where you log model hyperparameters and metrics and artifacts (plots or csvs). You would insert it in your code after model training. This is agnostic to Spark or Vaex or Pandas.
With MLFlow, you need to configure where these things are stored (AWS S3, Azure DataLake or even Local storage). I think it’s standard to use Databricks FileSystem (DBFS) linked to one of those cloud storages, which is why I ask about Databricks.
But how it works is Prefect wraps around Python code and makes it a Task. Writing out the model Tracking info will probably be it’s own Task in a broader Flow. Your end to end Flow will be something like
Get Data -> Transform -> Train Model (and maybe save somewhere) -> Log Metrics
Prefect orchestrates the whole pipeline. MLFlow is responsible for tracking experiments inside that
Log Metrics
portion. Hope that gives some insight?
a
Yes, that does the trick. Thank you very much @Kevin Kho
Your answers clarify a lot. Thank you
👍 1