Hi, my team is currently looking at alternatives t...
# prefect-community
a
Hi, my team is currently looking at alternatives to Airflow for our workflow management needs. Couple of high-level pain-points for us are Airflow's scheduler and low flexibility in terms of components such as the metastore DB. Apart from Prefect, Pachyderm comes to mind as an alternative. Can someone experienced with both Prefect and Pachyderm weigh the two against each other? Or maybe provide a resource that may help?
m
I've also been comparing the two. Pachyderm is focused on versioning data and uses an S3 backend to store 'commits' of objects between transformations/pipelines written in JSON. Each JSON transformation is like a Prefect Task and transformations are linked by commiting to the repo the next transformation consumes. In Prefect, Tasks are linked into Flows by a functional variable passing syntax.
Pachyderm also runs a worker per transformation in K8s which watch for changes in their input repos, Prefect uses Dask workers instead.
j
Hi @Michael Adkins - I don’t have much experience with Pachyderm, so I can’t speak to the details. My understanding is that Pachyderm is exclusively concerned with containerized data pipelines, data lineage, and data versioning. Prefect is primarily concerned with adding customizable workflow semantics (like retries, trigger logic, pausing, scheduling, caching, mapping, etc.) to arbitrary code (with a focus on data science and engineering), then allowing you to execute it however you prefer.
a
@Jeremiah Pachyderm offers feed-forward functionality, i.e. a pipeline can receive input from another and act accordingly. This is a use-case that Airflow doesn't handle natively. Is this possible with Prefect or in the roadmap?
j
Prefect has
Parameters
that allow flows to receive any external input. This means you could have a flow run another flow by passing parameters appropriately.
👍 1