• a

    Akshay Verma

    3 years ago
    I came across dagster (https://github.com/dagster-io/dagster) recently. I would like to now how does it stand in comparison to Prefect open-source offering?
  • a

    Adam Roderick

    3 years ago
    Looks like dagster includes a UI, which is not in prefect core but is in the cloud offering. Prefect Core includes super-simplified scheduling, which I do not see after a quick review of dagster.
  • a

    Adam Roderick

    3 years ago
    I don't have time right now for a full review. @Akshay Verma will you let me know what you find out?
  • Jeremiah

    Jeremiah

    3 years ago
    Hi @Akshay Verma, our recommendation for any comparison is to try them both and use the one that you think is best! However, comparison questions are common so I’ll try to give a complete answer here: Prefect and Dagster are both positioned as tools for building data applications. Prefect was designed to solve the negative engineering problem (https://medium.com/the-prefect-blog/positive-and-negative-data-engineering-a02cb497583d), and was inspired by hundreds of real-world lessons learned building Apache Airflow. It was explicitly designed to solve those issues. Dagster has so far failed to articulate a specific problem it solves other than claiming “data is broken.” It provides a DSL for defining DAGs over other tools, like Airflow. Frankly, we are as curious as you to see whom that appeals to. Prefect has always had a philosophy that “Python is the API” and any Python code can be transparently transformed into a Prefect workflow. Dagster started with a very cumbersome, explicit DSL, but has more recently adopted some of Prefect’s ideas, like a functional API, or running on Dask. We’re happy to drive innovation, but bolt-on features are often poor substitutes for well-designed ones. From an integration standpoint, Prefect was designed for every single piece to be completely pluggable via a simple API. This means, for example, that our Dask integration can take advantage of Dask’s data serialization features and resource affinities. In contrast, Dagster appears to have taken a “least common denominator” approach to integrations. This means that while Dagster DAGs can be executed on other engines like Airflow or Dask, they don’t take advantage of any of the unique features of those systems. Lastly, our Prefect Cloud platform forms the backbone of the Prefect ecosystem. It has free and paid tiers to support all use cases. We do not see anything comparable in Dagster; perhaps they will introduce something once we make Cloud available publicly.
  • j

    Joel Pinheiro

    3 years ago
    hi guys, I am very interested in Prefect to replace Airflow. my company has everything in AWS. there's any recommendation you can make for the Prefect AWS deployment? e.g. AMI ready?
    j
    Jeremiah
    3 replies
    Copy to Clipboard
  • c

    Chris

    3 years ago
    Hey, I’ve noticed that unused allocated memory isn’t freed after a task/flow is run, so if I run a scheduled flow which contains a memory-intensive task, that memory is constantly allocated from the first time the flow runs. I’ve found some workarounds (manually deleting variables at the end of a task’s run function), but this doesn’t always work (e.g. if the output of one task is passed as input to another). Is there a better workaround? I also have a similar issue when running large-scale flows with the Dask executor. It seems like memory is not freed between tasks - I found this relevant issue https://github.com/dask/dask/issues/3247 which suggests that the pool used by a Dask worker to complete a task is not closed after each task. This causes issues with large-scale flows as even if I split the data into small chunks and use these as mapped args to a task, the leaked memory accumulates with each task run and ends up causing workers to die. Has anyone experienced anything similar?
    c
    Chris White
    +1
    4 replies
    Copy to Clipboard
  • a

    Akshay Verma

    3 years ago
    Is it possible to define Dataclass as parameter for the flow?
    a
    Chris White
    5 replies
    Copy to Clipboard
  • a

    Akash

    3 years ago
    Hi, At my workplace, we're long-time Airflow users. After skimming through Prefect docs and blogs, I can see that it tackles many of Airflow's pain-points that we've had to hack our way around. But an advantage Airflow provides is the ability to run tasks written in non-Python programming languages, such as R, via operators (Bashoperator, Dockeroperator). Is it possible to do so in Prefect? If yes, how? If not, is it in the roadmap? Congrats and thanks for open-sourcing Prefect core!
    a
    2 replies
    Copy to Clipboard