Hello everyone, I'm a newbie DE. I'm using Python to do ETL for my data files, and I'm having an issue with managing the steps in my ETL process. I'm looking for a solution to manage the "PIPELINE" for my ETL process. Currently, I'm considering solutions like Airflow, Prefect, dbt, etc. As a beginner, which tool should I learn and use to better control my data ETL process? Thank you
c
Chris Reuter
07/17/2023, 7:58 PM
If you have various ETL processes that already exist (especially if that's already in Python), it's pretty easy to get started with scheduling & observing them (and being alerted when they fail) using Prefect.
Try following along with the Prefect tutorial and you'll see how all you have to do is install
prefect
, add some decorators for
@flow
to your existing code, deploy a worker & your flow, and then you have a repeatable data pipeline