https://prefect.io logo
p

Peter Roelants

01/11/2021, 7:20 PM
Hi Prefect Community, I'm new to Prefect and am trying to learn to work with the tool. However, I'm struggling to understand how to properly decouple building of Storage artefacts, and registering/running these artifacts at a later moment in time. It seems that with
flow.register
the creation and registration need to happen in the same call. Is there an example somewhere on how to decouple these steps? For example how to create and store a Docker build artefact that encapsulate a flow, and running/registering the flow stored in the Docker artefact at a later time without access to the original flow file.
b

Billy McMonagle

01/11/2021, 7:28 PM
I would love to see this as well! I'm not sure whether my solution will work for you, but at the bottom of each flow file:
Copy code
if __name__ == "__main__":
    flow.register(project_name=PROJECT_NAME, idempotency_key=flow.serialized_hash())
I've added a
register_flows
script to my build process that does this:
Copy code
#!/usr/bin/env bash

for flow in $(find flows -name "*.py"); do
  echo "registering $flow"
  python3 $flow
done
This builds the docker image and pushes to container registry. I am not sure how well this will scale, of course. I find myself wanting some kind of central "app" object that could handle all of the registration calls.
s

Spencer

01/11/2021, 7:53 PM
My team uses an internal library that reads all the files and extracts all
prefect.Flow
module variables using
importlib
(inspired by Airflow's
DagBag
mechanism). It annotates all the flows with the shared environment (configured in CI; decoupled from flow definition), storage and state handlers. Then the storage is built and each is registered
flow.register(..., build=False)
.
b

Billy McMonagle

01/11/2021, 8:11 PM
@Spencer That sounds like a great idea... at what stage does the build actually happen?
s

Spencer

01/11/2021, 8:48 PM
At a high level it's:
Copy code
* instantiate storage
* get all flows from all the files
* for flow in flows: storage.add_flow(flow)
* set flows.storage attribute (and any others like environment)
* storage.build()
* for flow in flows: flow.register(..., build=False)      # there are other fields here just omitted
p

Peter Roelants

01/12/2021, 7:30 AM
@Billy McMonagle If I'm understand correctly you are using the
idempotency_key
to prevent registering the flow when building the artefact? @Spencer You are using the
Storage.build()
combined with
add_flow()
to build the flow without registering? I'll look into that. Can you then register the flow by only having a reference to the build artefact (and no reference to the original flow defined in Python)? In general it sounds like Prefect is currently not designed to cleanly decouple storage and registration.
c

Chris Ottinger

01/12/2021, 12:19 PM
We also separate out the packaging of flows into images (build) from registration (deploy). That way we can use a single immutable flow image and deploy to different Prefect projects for different environments. A skeleton example of the pattern we use can be found here: https://github.com/datwiz/prefect-patterns/tree/main/cicd-deployment
build_flow.sh
builds image with the flow.
deploy_flow.sh
registers the flow with Prefect Cloud/Server. In the CI/CD pipeline, we have a step in between the build and deploy steps to push to our image repos. The approach is slightly different from one that @Spencer has described in that we package a single flow (or small number of flows) in a single repo that maps to to a single flow image. Each repo has a unique set of build and deploy scripts with flow names hard-coded into the the build and deploy scripts.
b

Billy McMonagle

01/12/2021, 2:31 PM
@Peter Roelants My intention with the idempotency key is so that the flow version increments only when a change to the flow has been made. (I'd love to see this as a default)
4 Views