I manage a lot of data pipelines that ingest from many different places. In addition I had monitors on this data in multiple steps of my pipelines. It ultimately became very complicated to know the status and health of these data pipelines at any given moment.
As a result, I created
Panda Patrol which is an open-source tool that is best described as Sentry for your data. With just one function call, you can monitor and profile all the data in your pipelines. It integrates right into your Prefect pipeline with no additional setup. I tried to keep the package as simple and easy to use as possible. Hope some of you guys find it useful! Here's a
demo video of how it works.