@Pierre Monico I was really trying to avoid a detailed feature comparison 🙂 and those were only examples. I’m sure Databricks can do a lot, but to persuade your client I’m not sure that purely comparing features is the most helpful strategy. I would focus more on the problem that each tool solves and the developer experience.
For example if you have 20 tasks that each depend on each other in some way (some need to run sequentially one after the other, and some in parallel), and you have pipelines that need various (possibly conflicting) package dependencies to interact with 3rd party systems:
• How do you define dependencies between your tasks - do you want to click through a drag-and-drop tool to manually set this up 20 times? What if you have 100 tasks? In Prefect, you just define it in Python.
• How do you identify what was the issue when your job failed? In Prefect, your tasks can be very small and you have visibility into each of them. Databricks encourages larger tasks, and it’s more difficult to identify the root cause of your problem this way. In Prefect, you can get notified about the exact task that failed and see it clearly marked red in the UI. This is why it’s most helpful when things go wrong because you have that visibility.
• What if you want to define more complex dependencies, e.g. this task can only run if specific condition is met, or should be skipped otherwise. Or all downstream tasks should fail if any of the upstream task failed. Such complex dependencies are best defined as code, you can’t do that in a drag-and-drop tool or YAML.
• What if your pipeline A and pipeline B need different (conflicting) Python dependencies? I have no idea how this is done in Databricks. In Prefect, again, you can have it all defined programmatically. With Docker storage, Prefect can even build a Docker image for you and push it to a registry of your choice.
I would really focus more on the problem, the user experience and the audience this tool is for, rather than purely comparing features. Engineers I worked with usually hate drag-and-drop and they want reproducible workflow as code, and infrastructure as code. It depends on your team’s preference.