Hey guys I m new to Prefect as I m trying new tools How does Prefect Community #ask-community

Hey guys! I'm new to Prefect as I'm trying new too...

Amber Papillon

03/29/2021, 3:40 PM

Hey guys! I'm new to Prefect as I'm trying new tools. How does Prefect compare to Dagster? Also, would MLFlow complement the use of Prefect?

Kevin Kho

03/29/2021, 4:05 PM

Hey Jean, I’ll speak to the MLFlow section of this. I used MLFlow as a way to track experiment results on Databricks. The relationship would be wrapping that MLFlow writing as a Task and monitoring the success of writing out to MLFlow. MLFlow is also good for storing artifacts from model runs (plots, metrics).

Kyle Moon-Wright

03/29/2021, 4:53 PM

I’ll add this article to your discovery gathering - it goes through the tools you mentioned in a pretty holistic way (since I may be biased to prefect duck).

Amber Papillon

03/29/2021, 5:28 PM

@Kevin Kho thanks! And does it work alongside prefect?

Amber Papillon

03/29/2021, 5:29 PM

@Kyle Moon-Wright I'm gonna take a look at that Article again! I'm still on the fence with Prefect and Dagster. One think I'm sure is that I would like to use MLFlow for experiment monitoring and hyperparam opt

🧪 3

Kevin Kho

03/29/2021, 5:38 PM

I haven’t tried it, but they should work together without a problem because Prefect works with Python code and MLFlow has a Python API

upvote 1

Dylan

03/29/2021, 7:36 PM

Dagster and Prefect are actually quite different - Dagster’s focus appears to be on tracking data assets and providing things like type systems, etc. They are very interested in what your tasks are doing. This means that writing a Dagster pipeline requires thinking through config, and restructuring your code to fit their models. Prefect, on the other hand, is focused on the orchestration of your tasks and pipelines — things like scheduling, recovery from failure, running workflows in heterogeneous environments. This means that writing Prefect pipelines requires only a light touch to your existing code.

Dylan

03/29/2021, 7:37 PM

I will say that it sounds like we can definitely handle your usecase 😄

Amber Papillon

03/29/2021, 10:40 PM

Hey @Dylan thank you for the prompt response. Prefect does seem more straightforward. I'm def leaning more towards it, just gotta see how my pieline is gonna work with Prefect + MLFLlow

Dylan

03/29/2021, 10:41 PM

Let us know how we can help! 😄

Amber Papillon

03/29/2021, 11:30 PM

What will I be missing from Prefect by not taking advantage of Prefect-Cloud other than the cloud part itself? Are there any features not in the open sourced part? I'm a bit confused on that

Kyle Moon-Wright

03/29/2021, 11:34 PM

Hey @Amber Papillon, Take a look at the differences between Prefect Server and Prefect Cloud here. There are a few more differences between the two backend APIs that I can link as well (they seem to have moved…).

Amber Papillon

03/30/2021, 12:27 AM

@Kyle Moon-Wright Thank you! Yes, I would appreciate it if you could. I didn't really understand the section "Scale & Performance" it's not very clear on the reason for that difference.

Kyle Moon-Wright

03/30/2021, 12:47 AM

Hmm, it looks like the page I was referring to was removed for the time being as we are transitioning to success-based pricing, so I’ll need to check internally with the team for it’s whereabouts tomorrow. It was really just a feature list and doesn’t address your question ^. Did you have a question on the “Scale & Performance” section that I can clarify? Prefect Server was not intended for production purposes, but rather as a backend API for single developers/hobbyists to get started orchestrating with Prefect. Going the Server route, you are in charge of maintaining the networking between services and health of your containers - while Cloud was built with a highly-available and responsive API meant for production-level flows, taking care of the backend entirely so you can focus on your business critical work.

Kyle Moon-Wright

03/30/2021, 12:49 AM

If Server meets your needs, that’s great and you should go for it! However when used at scale, it will not be representative of your experience using Cloud.

Amber Papillon

03/30/2021, 2:46 PM

I do believe the Server will meet my needs! Thank you for the info!

Amber Papillon

03/30/2021, 3:05 PM

MLFlow Tracking has some nice features that seem to be lacking in Prefect (a record of the current git commit, ML metrics, etc.) They also integrate nicely with Databricks that allows for easy cluster deployment. However, they really lack in building complicated pipelines, in which Prefect excels. Is there a way to get "the best of all worlds"? That is, integrate Prefect with MLFlow?

Kevin Kho

03/30/2021, 3:16 PM

Do you plan to use Spark and Databricks clusters?

Amber Papillon

03/30/2021, 3:18 PM

I plan to use Spark/Vaex and occasionally Dask but that's unlikely. Databricks will be much later on.

Kevin Kho

03/30/2021, 3:24 PM

Ok so MLFlow has a couple of features (Tracking, Deployment, Model Registry). I personally only have experience with Tracking. Tracking works by having a snippet of code where you log model hyperparameters and metrics and artifacts (plots or csvs). You would insert it in your code after model training. This is agnostic to Spark or Vaex or Pandas.

Kevin Kho

03/30/2021, 3:25 PM

With MLFlow, you need to configure where these things are stored (AWS S3, Azure DataLake or even Local storage). I think it’s standard to use Databricks FileSystem (DBFS) linked to one of those cloud storages, which is why I ask about Databricks.

Kevin Kho

03/30/2021, 3:27 PM

But how it works is Prefect wraps around Python code and makes it a Task. Writing out the model Tracking info will probably be it’s own Task in a broader Flow. Your end to end Flow will be something like

Get Data -> Transform -> Train Model (and maybe save somewhere) -> Log Metrics

Kevin Kho

03/30/2021, 3:29 PM

Prefect orchestrates the whole pipeline. MLFlow is responsible for tracking experiments inside that

Log Metrics

portion. Hope that gives some insight?

Amber Papillon

03/30/2021, 3:55 PM

Yes, that does the trick. Thank you very much @Kevin Kho

Amber Papillon

03/30/2021, 3:55 PM

I'm trying to follow this: https://www.datarevenue.com/en-blog/mlops-for-research-teams

Amber Papillon

03/30/2021, 3:55 PM

Your answers clarify a lot. Thank you

👍 1

3 Views

Open in Slack

Previous Next