Hey prefect people! Just saw that Netflix released Metaflow, their internal workflow engine (ish). From what I can tell, it focuses simply on abstracting away the infrastructure required to run code instead of solving something like the negative engineering problem.I was wondering if you guys were gonna do a blog post or compare/contrast with that product!
Just from the 1000 foot POV, it looks a lot like how the documentation for prefect reads for running with prefect’s core. Haven’t dug too much into the details, but would love to hear something because the docs at least read similarly!
12/04/2019, 2:45 AM
This is something we'll be looking at! Thanks
12/04/2019, 7:14 AM
Hey @Alex Cano! While I haven’t had the chance to dig incredibly deep, I do have a few initial observations:
• While Metaflow does have some similarities with Core, it has none with Cloud (no API / no management layer / low visibility / etc.)
• Metaflow appears to be heavily focused on versioning / checkpointing machine learning model builds
• Related to the above point, steps in metaflow appear to be more of an organizational tool, whereas Prefect Tasks are first class objects. So for example, you can’t have a task which only runs when an upstream dependency fails in Metaflow, as a task failure just stops the flow in Metaflow
• No scheduling (so for example you could schedule your metaflow instances via Prefect)
• Metaflow has more restrictions on the types of data that can be exchanged between steps, and any data exchange is not tracked as a dependency
• appears to only support AWS deployments? (also no dask support in Metaflow as far as I can tell)
I’m sure there are other similarities / differences but those were my initial takeaways from playing around a bit. I would love it if others chimed in with any other observations they find!
After reading more through the docs and examples, both what Chris said and the reddit comment seem pretty spot on! It seems really focused around making the transition from running on a laptop to the cloud easier, and just keeping track of versions for a specific machine learning model. Also like Chris mentioned, tightly and exclusively integrated with AWS.
I feel like the docs partially read the same because Netflix kind of wanted them to, in the sense of engine UI (basic dependency between tasks, graph visibility, mapping, naming of objects). However it looks like Metaflow is extremely opinionated on how it wants the world to work, much how Airflow is. Maybe Metaflow’s opinions are better than Airflow’s, but I personally prefer something a lot less opinionated!
12/05/2019, 12:11 PM
Does Prefect have any support for managing code dependencies like Metaflow?
12/05/2019, 1:01 PM
Hey Oliver - my understanding about metaflow’s dependency management hooks is that it allows you to specify Conda packages on remote AWS workers. There are many such hooks within e.g. dask for specifying code dependencies on the workers (including pip and custom files), but Prefect’s preferred method of true code dependency management is Docker. Configuration of Docker storage for Flows can be found in the Docker storage class within Prefect Core