https://prefect.io logo
Title
z

Zach Schuster

10/06/2022, 2:11 PM
Hi folks! Would like to pick your brains regarding the CICD side of Prefect2. We currently are running flows via ECS with S3 storage. We have multiple different flows that don't share the same dependencies. It seems inefficient and error prone to build one large image for all of these flows to run in. So my question is: • Is anyone using a mono repo for prefect2 with multiple flows each having their own image and requirements files getting deployed via github actions or any other automated deployment tool? • If so, any insight into how you are building images with different requirements files in an automated fashion would be really helpful. Thank you!
1
@Joe Krawiec @Emily Knight
k

Khuyen Tran

10/06/2022, 2:13 PM
You can create different deployments for different flows that use different images
z

Zach Schuster

10/06/2022, 2:17 PM
@Khuyen Tran We have set that up manually right now, but looking to see if anyone has automated this process, so that when a PR to main is completed, an image rebuilds with the correct requirements.txt file for that specific image
k

Khuyen Tran

10/06/2022, 2:19 PM
So the requirements of the images of your flows change frequently?
z

Zach Schuster

10/06/2022, 2:32 PM
The requirements are different for each flow. So if we have a few flows that deal with dbt and a few that deal with machine learning, the requirements.txt files will be much different. It'd be great to automate the image build process for each of the flows with separate requirements for each
k

Khuyen Tran

10/06/2022, 3:02 PM
so I guess right now you are building the docker image for each flow?
z

Zach Schuster

10/06/2022, 3:22 PM
correct doing it manually
c

Chris L.

10/06/2022, 3:42 PM
Hey Zach! My team are using the same multi-deployment multi-image setup that you have outlined above. We are using GitHub Actions, a workflow that builds the parent image (with common dependencies e.g. pandas, numpy, prefect) and then triggers another workflow that builds our child images for each execution environment (e.g. classification, regression, etl).
A few tips we found extremely helpful: 1. Use https://github.com/docker/build-push-action to simplify the build step in your action and to enable caching 2. Bundle up your flow scripts into the image using
COPY
but don't worry to have an
ARG
immediately before the
COPY
step that changes depending on the git sha of triggered workflow. This ensures that your flow scripts are always copied over, but all previous layers are pulled from cache!
:thank-you: 1
🙌 1
The experience once it's up and running is a charm. Every developer can get to see their flows "deployed" to prefect cloud and running in our k8s cluster in less than 2 minutes after merging to main or opening a PR.
💯 2
👍 1
It's 💯 worth the time to setup if you are thinking about it.
n

Nate

10/06/2022, 6:07 PM
In case any of these recipes are helpful
👍 1
z

Zach Schuster

10/06/2022, 6:33 PM
Thanks so much for the help everybody! @Chris L. appreciate the tips and I'll be sure to use the action mentioned above
@Chris L. Do you have a test suite that runs as a check prior to allowing PRs into main? If so, how are you handling the workflow for different dependencies for different tests? We were thinking of having a separate test suite for each 'execution env' that you mentioned above
c

Chris L.

10/07/2022, 3:17 AM
We use environment.yml files and use the micromamba GH action https://github.com/marketplace/actions/provision-with-micromamba with caching. Blazing fast dependency resolution and installs. In our experience, the biggest blocker in iteration speed is managing and waiting for different environments to build….sometimes building the env takes longer than the tests to run. It’s really a speed game. And with Prefect 2.0 (or any python micro service setup really), the three key ingredients for a fast CICD workflow imo are: 1. Bundle flow scripts into your image; 2. Build images from cache (but don’t forget to always invalidate the COPY step for your flow scripts; 3. Micromamba
🙌 1
b

Brad Clark

11/01/2022, 3:38 PM
@Chris L. are you building a new container for each flow in this workflow? We are also running on kubernetes (just getting started), and can run flows on the agent, but I was interested in not having to install all the python libraries for every flow in the agent container