Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

Hi team! I was wondering if anyone has any systems of productionising flows that they are happy with. We’re currently looking at systems within the Gitlab CICD that on push to dev/main will clone out a repository of flows and register each of them (uploading into an S3 bucket). I believe we could also use the Github/Gitlab storage option (question - does this support branching?), but we’d still need to clone it out so we can register each of the flow files. Has anyone got a nice setup that allows for rapid iteration and automatic flow registration that theyd be happy to talk about?

<@U0141PE1A84> and I have worked on this a bit.  We wrote a FlowBuilder class which handles a lot of the boilerplate build of a Flow.  The dev’s job is to implement the method that contains the actual flow logic, define the flow parameters, and specify environment configs values.  When the code is eventually merged into main, the Gitlab CI pipeline calls `infima flows register all` (`infima` is our library’s CLI tool).  `infima flows register` then goes through all available flows in the library, builds the Flow object by calling the method that’s been implemented, builds out the Storage/Environment from the config values, and registers them to the Prefect Cloud.

We initially had a separate repo for flows but moved all the flows into our main library because we couldn’t keep the versions synced.  Instead, we version everything through our library’s version (which is handled through semantic release) and register every flow on each bump so they all always point to the most recent image.  This basically means we completely ignore the version up on the Cloud UI.  We occasionally run into race conditions where we register a flow and conflict with it’s scheduled operation but we try to be robust with the construction of the flows.

We have not migrated to the new RunConfig but intend to and we are using S3Storage just because it was simple.

We are planning to migrate to a staging/prod split where the flows will still be automatically registered but the auto-register to prod will only occur when we create a prod release through a tag.   This means there will be duplication of some of the flows that require it. We have also sketched out ideas for requiring a test mode with the FlowBuilder so that we can run the flows in the CI test job within a K8s cluster and thus test the flows closer to the real thing instead of relying on a lot of mocking.

We’ve thought about open sourcing it as a bit of a conversation starter for how to make these things operate in a more full-fledged system.

Sounds like a good setup, has given me a few ideas too - I like the FlowBuilder idea to standardise things and make it easier to set up the flow!