https://prefect.io logo
Join the conversationJoin Slack
Channels
announcements
ask-marvin
best-practices-coordination-plane
data-ecosystem
data-tricks-and-tips
events
find-a-prefect-job
geo-australia
geo-bay-area
geo-berlin
geo-boston
geo-chicago
geo-colorado
geo-dc
geo-israel
geo-japan
geo-london
geo-nyc
geo-seattle
geo-texas
gratitude
introductions
marvin-in-the-wild
prefect-ai
prefect-aws
prefect-azure
prefect-cloud
prefect-community
prefect-contributors
prefect-dbt
prefect-docker
prefect-gcp
prefect-getting-started
prefect-integrations
prefect-kubernetes
prefect-recipes
prefect-server
prefect-ui
random
show-us-what-you-got
Powered by Linen
show-us-what-you-got
  • a

    Akshay Verma

    07/17/2019, 9:16 AM
    I came across dagster (https://github.com/dagster-io/dagster) recently. I would like to now how does it stand in comparison to Prefect open-source offering?
  • a

    Adam Roderick

    07/17/2019, 5:14 PM
    Looks like dagster includes a UI, which is not in prefect core but is in the cloud offering. Prefect Core includes super-simplified scheduling, which I do not see after a quick review of dagster.
  • a

    Adam Roderick

    07/17/2019, 5:15 PM
    I don't have time right now for a full review. @Akshay Verma will you let me know what you find out?
  • j

    Jeremiah

    07/17/2019, 6:29 PM
    Hi @Akshay Verma, our recommendation for any comparison is to try them both and use the one that you think is best! However, comparison questions are common so I’ll try to give a complete answer here: Prefect and Dagster are both positioned as tools for building data applications. Prefect was designed to solve the negative engineering problem (https://medium.com/the-prefect-blog/positive-and-negative-data-engineering-a02cb497583d), and was inspired by hundreds of real-world lessons learned building Apache Airflow. It was explicitly designed to solve those issues. Dagster has so far failed to articulate a specific problem it solves other than claiming “data is broken.” It provides a DSL for defining DAGs over other tools, like Airflow. Frankly, we are as curious as you to see whom that appeals to. Prefect has always had a philosophy that “Python is the API” and any Python code can be transparently transformed into a Prefect workflow. Dagster started with a very cumbersome, explicit DSL, but has more recently adopted some of Prefect’s ideas, like a functional API, or running on Dask. We’re happy to drive innovation, but bolt-on features are often poor substitutes for well-designed ones. From an integration standpoint, Prefect was designed for every single piece to be completely pluggable via a simple API. This means, for example, that our Dask integration can take advantage of Dask’s data serialization features and resource affinities. In contrast, Dagster appears to have taken a “least common denominator” approach to integrations. This means that while Dagster DAGs can be executed on other engines like Airflow or Dask, they don’t take advantage of any of the unique features of those systems. Lastly, our Prefect Cloud platform forms the backbone of the Prefect ecosystem. It has free and paid tiers to support all use cases. We do not see anything comparable in Dagster; perhaps they will introduce something once we make Cloud available publicly.
  • j

    Joel Pinheiro

    08/05/2019, 11:47 AM
    hi guys, I am very interested in Prefect to replace Airflow. my company has everything in AWS. there's any recommendation you can make for the Prefect AWS deployment? e.g. AMI ready?
    j
    d
    • 3
    • 3
  • c

    Chris

    08/08/2019, 10:09 AM
    Hey, I’ve noticed that unused allocated memory isn’t freed after a task/flow is run, so if I run a scheduled flow which contains a memory-intensive task, that memory is constantly allocated from the first time the flow runs. I’ve found some workarounds (manually deleting variables at the end of a task’s run function), but this doesn’t always work (e.g. if the output of one task is passed as input to another). Is there a better workaround? I also have a similar issue when running large-scale flows with the Dask executor. It seems like memory is not freed between tasks - I found this relevant issue https://github.com/dask/dask/issues/3247 which suggests that the pool used by a Dask worker to complete a task is not closed after each task. This causes issues with large-scale flows as even if I split the data into small chunks and use these as mapped args to a task, the leaked memory accumulates with each task run and ends up causing workers to die. Has anyone experienced anything similar?
    c
    m
    • 3
    • 4
  • a

    Akshay Verma

    08/13/2019, 6:14 AM
    Is it possible to define Dataclass as parameter for the flow?
    c
    • 2
    • 5
  • a

    Akash

    08/22/2019, 10:28 AM
    Hi, At my workplace, we're long-time Airflow users. After skimming through Prefect docs and blogs, I can see that it tackles many of Airflow's pain-points that we've had to hack our way around. But an advantage Airflow provides is the ability to run tasks written in non-Python programming languages, such as R, via operators (Bashoperator, Dockeroperator). Is it possible to do so in Prefect? If yes, how? If not, is it in the roadmap? Congrats and thanks for open-sourcing Prefect core!
    j
    • 2
    • 2
  • g

    Gopal

    08/23/2019, 1:37 AM
    Hi, I recently came across Prefect while reading on Dask. Currently I use AWS beanstalk for the deployment of our web application. How can I deploy dask cluster with Prefect in Beanstalk worker environment? Appreciate some help. Thanks
    j
    j
    • 3
    • 4
  • m

    Markus Binsteiner

    08/23/2019, 4:03 PM
    Have you guys used Apache Arrow/Apache Arrow Flight?
    j
    • 2
    • 1
  • m

    Markus Binsteiner

    08/23/2019, 4:03 PM
    https://www.dremio.com/understanding-apache-arrow-flight/
    👍 1
  • m

    Markus Binsteiner

    08/23/2019, 4:04 PM
    IMHO, might be a good option as a return value for tasks...
  • m

    Markus Binsteiner

    08/23/2019, 4:04 PM
    Dask would probably be the place to implement it, but still...
  • a

    An Hoang

    08/30/2019, 12:46 AM
    Just a little comment on the documentation. As an ESL person, when reading https://docs.prefect.io/guide/core_concepts/states.html#state-types section I got confused a bit between state and state types. I'd propose changing it to:
    "There are three main state types: Pending, Running, and Finished. Flows and tasks typically progress through these three states types. At each stage of the execution pipeline, the current state type determines what actions are taken.
    Sorry if this seems like nitpicking! I don't mean to 🙂
  • j

    Jeremiah

    08/30/2019, 12:47 AM
    Thank you! This is so valuable to hear — we want to be sure that our docs are easy to follow, and we appreciate the help!
    🤗 1
  • j

    Jeremiah

    08/30/2019, 12:59 AM
    https://github.com/PrefectHQ/prefect/pull/1427
    ❤️ 3
  • c

    Carlos Gimenez

    09/16/2019, 10:03 PM
    Hi everyone, I came across Kedro (https://github.com/quantumblacklabs/kedro/). Does anyone have insight to how it compares to Prefect Core?
    c
    • 2
    • 4
  • g

    Gary Liao

    09/19/2019, 3:36 AM
    Is it possible to make GUI tool like SAS Enterprice Guide, which can create data workflow interactively and export the final code?
    j
    • 2
    • 1
  • k

    KJ

    09/19/2019, 6:19 PM
    Is there an example of a caching Task Subclass?
  • k

    KJ

    09/19/2019, 6:20 PM
    I see one with a task decorator.
  • c

    Chris White

    09/19/2019, 6:24 PM
    All of the keyword arguments that you can pass to the task decorator are the same as the initialization keyword arguments for a Task class, so:
    @task(cache_for=...)
    is the same thing as
    my_task = Task(cache_for=...)
    (where Task here is whatever actual task class you’re creating)
  • k

    KJ

    09/19/2019, 6:25 PM
    That’s what I thought, maybe I am calling cache_for in the wrong spot
  • c

    Chris White

    09/19/2019, 6:26 PM
    can you share a snippet of your code?
  • k

    KJ

    09/19/2019, 6:34 PM
    Unfortunately I can not as the power just went out. But in one of my tests after moving cache_for to the Task construction did not TypeError. I have a couple of tests to run to verify it is caching when power comes back.
    👍 1
  • k

    KJ

    09/19/2019, 6:38 PM
    Early lunch for me. Power crew down the street didn’t plan appropriately. 🤷‍♂️
    d
    • 2
    • 2
  • s

    stefano

    10/31/2019, 10:34 AM
    This will probably sounds like a weird question, but how is Prefect different from Celery?. I think the differences from Airflow are well described in the docs, yet quite clear, but I'm not sure why I should prefer it over Celery.
    👀 1
    j
    a
    +3
    • 6
    • 7
  • e

    emre

    11/12/2019, 1:47 PM
    Looks like someone’s running long flows :watching: https://github.com/cicdw/bash-music
    😂 3
  • j

    Jeremiah

    11/12/2019, 1:48 PM
    Haha nice spot — full story https://twitter.com/jlowin/status/1194088941399478272?s=21
    😂 4
    e
    m
    l
    • 4
    • 5
  • n

    Nick Maludy

    11/24/2019, 1:08 PM
    hey, new Perfect user here... is there an HA deployment architecture with perfect and maybe a REST API? or is that something I'll need to craft myself?
    j
    • 2
    • 1
  • a

    Alex Cano

    12/04/2019, 2:27 AM
    Hey prefect people! Just saw that Netflix released Metaflow, their internal workflow engine (ish). From what I can tell, it focuses simply on abstracting away the infrastructure required to run code instead of solving something like the negative engineering problem.I was wondering if you guys were gonna do a blog post or compare/contrast with that product! Just from the 1000 foot POV, it looks a lot like how the documentation for prefect reads for running with prefect’s core. Haven’t dug too much into the details, but would love to hear something because the docs at least read similarly!
    c
    c
    +2
    • 5
    • 6
Powered by Linen
Title
a

Alex Cano

12/04/2019, 2:27 AM
Hey prefect people! Just saw that Netflix released Metaflow, their internal workflow engine (ish). From what I can tell, it focuses simply on abstracting away the infrastructure required to run code instead of solving something like the negative engineering problem.I was wondering if you guys were gonna do a blog post or compare/contrast with that product! Just from the 1000 foot POV, it looks a lot like how the documentation for prefect reads for running with prefect’s core. Haven’t dug too much into the details, but would love to hear something because the docs at least read similarly!
c

CJ Wright

12/04/2019, 2:45 AM
This is something we'll be looking at! Thanks
👍 1
💯 1
c

Chris White

12/04/2019, 7:14 AM
Hey @Alex Cano! While I haven’t had the chance to dig incredibly deep, I do have a few initial observations: • While Metaflow does have some similarities with Core, it has none with Cloud (no API / no management layer / low visibility / etc.) • Metaflow appears to be heavily focused on versioning / checkpointing machine learning model builds • Related to the above point, steps in metaflow appear to be more of an organizational tool, whereas Prefect Tasks are first class objects. So for example, you can’t have a task which only runs when an upstream dependency fails in Metaflow, as a task failure just stops the flow in Metaflow • No scheduling (so for example you could schedule your metaflow instances via Prefect) • Metaflow has more restrictions on the types of data that can be exchanged between steps, and any data exchange is not tracked as a dependency • appears to only support AWS deployments? (also no dask support in Metaflow as far as I can tell) I’m sure there are other similarities / differences but those were my initial takeaways from playing around a bit. I would love it if others chimed in with any other observations they find!
💯 3
j

josh

12/04/2019, 12:55 PM
This comment from someone on reddit also has some nice points about the subject! https://www.reddit.com/r/datascience/comments/e5qx5d/metaflow_netflix_has_opensourced_their_python/f9m829v
:upvote: 2
a

Alex Cano

12/04/2019, 3:06 PM
After reading more through the docs and examples, both what Chris said and the reddit comment seem pretty spot on! It seems really focused around making the transition from running on a laptop to the cloud easier, and just keeping track of versions for a specific machine learning model. Also like Chris mentioned, tightly and exclusively integrated with AWS. I feel like the docs partially read the same because Netflix kind of wanted them to, in the sense of engine UI (basic dependency between tasks, graph visibility, mapping, naming of objects). However it looks like Metaflow is extremely opinionated on how it wants the world to work, much how Airflow is. Maybe Metaflow’s opinions are better than Airflow’s, but I personally prefer something a lot less opinionated!
💯 1
o

Oliver Mannion

12/05/2019, 12:11 PM
Does Prefect have any support for managing code dependencies like Metaflow?
c

Chris White

12/05/2019, 1:01 PM
Hey Oliver - my understanding about metaflow’s dependency management hooks is that it allows you to specify Conda packages on remote AWS workers. There are many such hooks within e.g. dask for specifying code dependencies on the workers (including pip and custom files), but Prefect’s preferred method of true code dependency management is Docker. Configuration of Docker storage for Flows can be found in the Docker storage class within Prefect Core
👍 1
View count: 3