n

    Nat Busa

    2 years ago
    Hi, I hope this is the right place to post it ... I understand your opencore philosophy, but I am having a hard time choosing between prefect and two other projects: dagster and metaflow ... Also just using dask distributed does not sound bad ... Any thoughts on this?
    Chris White

    Chris White

    2 years ago
    Hi @Nat Busa! This is a great place to post this sort of question. Ultimately, I believe this might be more of an apples to oranges comparison that it might seem at first glance, but here’s my own high level guide: - if you are looking to easily distribute some large computations or are just looking to take advantage of distributed computation on an ad-hoc basis, you should use dask distributed. If you ever want workflow semantics / retries / scheduling / trigger logic / etc. you should consider hooking dask distributed up to a workflow tool (currently Prefect is the only one that actually supports the power of dask) - if you are looking to version your machine learning model artifacts on AWS, you should probably use Metaflow. While Metaflow shares some of Prefect’s API and design, it is ultimately not a workflow tool but rather a tool for data scientists to self-serve maintaining their ML model builds with specific design for AWS deployments. - Dagster appears to be a workflow tool, but I’m not as familiar with it. Ultimately I’d love for a comparison to come from our community, but in the meantime by biggest takeaway from their documentation is that we have fundamentally different philosophies. Prefect strives to be light-weight, and focuses on the surfacing of errors and the ability to specify arbitrary dependencies between jobs / tasks with “workflow semantics” (retries, triggers, etc.). Prefect also has a design philosophy of minimal but sane defaults while simultaneously being deeply configurable. Dagster, on the other hand, seems more interested in what your tasks are doing and consequently is much less light-weight in design and appears to require some intricate configuration. Moreover, last I checked Dagster didn’t support retries which makes me think they aren’t as interested in the “workflow” piece but rather in the configuration aspect. I’m not 100% certain what their target use case is, so I don’t have a good rule of thumb for when you might choose Prefect over Dagster (or vice versa) but hopefully a community member will one day enlighten us all!
    n

    Nat Busa

    2 years ago
    Thank you for the insight! Very helpful 😃 will try to digest all that ... Currently, each tool has its own merits although I think that prefect-dask is probably the most compact design out there and I am huge fan of minimal/no-configuration design apis. Dagster does come with a nice UI though ...
    Also lately in terms of philosophy I see some projects are going in the direction of meta workflow (TFX, Dagster) with rendering/deployment materialized for multiple target workflow tools and schedulers. While other tools tend to offer a more "pure" dependency chain. Also from a runner perspective I see some different positions there with k8s and docker containers vs processes. Definitely many choices to pick and I agree with you that defining the right philosophy is very important ...
    Dylan

    Dylan

    2 years ago
    @Nat Busa Regarding the UI, we consistently found in our research that users don’t actually want to maintain the persistence layer and infrastructure necessary for a useful UI (including things like Auth, Teams, Projects, Secrets configuration, etc) for the orchestration and running of distributed workflows. That’s one of the primary reasons we built Prefect Scheduler (which you can sign up for here https://www.prefect.io/products/cloud-scheduler while it’s in Beta) in the way that we did. Prefect Scheduler includes a full UI, GraphQL API, and Scheduler. Using our Hybrid Execution model (we’re working on the name) Scheduler users orchestrate workflows using our infrastructure without ever sending us their code or their data. If you’re interested in learning more, shoot me a DM!
    m

    Mark Koob

    2 years ago
    Hi Nat - my original intent was to use dask's
    Delayed
    api to build our workflows and then run them with
    distributed
    , but the prototypes wound up being really unweildy. I've switched to using Prefect since, and even though we aren't quite to the point of scaling up it has been a pleasant experience. We're running flows with on-prem hardware accessed through a dask executor.