Hi everyone,
I’m interested to hear how you’ve handled scaling up Prefect and dealing with a large code-base.
We have around ~40 data pipelines to manage, around ~15 using Prefect. We’re building new pipelines all the time and old pipelines are being migrated too.
Currently these pipelines run the ELT process end to end and we have helper libraries that are reused across pipelines.
We’re considering how best to architect these pipelines to encourage code-reuse and effectively maintain them. We’ll have 3 teams maintaining them so code ownership is a consideration too.
One idea is to split up the pipelines so they don’t run the full process end to end. We could use an event driven architecture, so smaller flows are triggered to run based on an external event handler. It gives us more choices for team ownership and could make it a little easier to replace or add steps in the ELT process.
The alternative is to keep doing what we’re doing and make the most of shared libraries to encourage code reuse.
In either case, we’ll do more to use configuration so similar datasets are processed using the same data pipeline.
I’d be keen to hear your approach to handling large code-bases.