Hi everyone I m interested to hear how you ve handled scalin Prefect Community #ask-community

Hi everyone, I’m interested to hear how you’ve han...

Andrew Moist

05/13/2022, 10:19 AM

Hi everyone, I’m interested to hear how you’ve handled scaling up Prefect and dealing with a large code-base. We have around ~40 data pipelines to manage, around ~15 using Prefect. We’re building new pipelines all the time and old pipelines are being migrated too. Currently these pipelines run the ELT process end to end and we have helper libraries that are reused across pipelines. We’re considering how best to architect these pipelines to encourage code-reuse and effectively maintain them. We’ll have 3 teams maintaining them so code ownership is a consideration too. One idea is to split up the pipelines so they don’t run the full process end to end. We could use an event driven architecture, so smaller flows are triggered to run based on an external event handler. It gives us more choices for team ownership and could make it a little easier to replace or add steps in the ELT process. The alternative is to keep doing what we’re doing and make the most of shared libraries to encourage code reuse. In either case, we’ll do more to use configuration so similar datasets are processed using the same data pipeline. I’d be keen to hear your approach to handling large code-bases.

Anna Geller

05/13/2022, 10:50 AM

About scale: if you leverage Prefect Cloud, you don't need to worry about scaling up the entire orchestration API - you only need to ensure your execution layer scales, and this is relatively straightforward if you leverage e.g. horizontally scaled Kubernetes cluster or even serverless/autopilot Kubernetes cluster. For Prefect Server check this topic and related topic linked there.

Anna Geller

05/13/2022, 10:54 AM

For repository structure, I understand that it's an important and not easy decision but you need to consider (based on your use case/team needs): • whether monorepo or one repo per project makes more sense • what are code dependencies of specific flows/projects - those might be easier to manage in a single repo e.g. to reuse some shared utility modules and shared Docker images For some repository examples and packaging dependencies, check this one Also, this discussion may help

Anna Geller

05/13/2022, 10:55 AM

and re event-driven workflows, that's totally supported, this page dives deeper into it

Andrew Moist

05/13/2022, 12:53 PM

Thanks very much @Anna Geller. Will check out those links.

👍 1

9 Views

Open in Slack

Previous Next