Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

Hi All, I am evaluating Prefect Core and I love the look of it at the moment and have some questions around deployments, flow registration and versioning.
We currently have a Build/Deploy process that is using Azure Pipelines and I anticipate that given a repo it would bundle up the relevant flows and python files.
For deployment I would like to deploy to AWS S3 buckets with a version number. I do see that there is a storage option but that would require the deploy agent to have prefect/python installed to call flow.build with the s3 storage option set.
In terms of hosting I was hoping to self host in ECS the prefect server/UI and relevant agents
1. Is there an option where I can deploy versioned files myself using either terraform or aws cli to S3, if so how do I register the relevant flows?
2. I don't want my deployment pipeline to have a VPN to the relevant AWS VPC/environment so can flow.register() be completely separated from flow.build()/etc
3. For flows deployed to S3, how does the server ensure the latest version is being used/registered? 
4. My AWS environments are in completely separate accounts/VPCs, and I noticed Prefect environments/agents/flows have labels, is there a particular pattern that ensures flows are reusable between environments?


Hi <@U0173N1NCCB> and welcome!  I’ll try my best to address your questions:

1. &amp; 2. You can technically separate build from registration, although it might require a few extra steps on your end.  In particular, you can call `flow.build()` to produce the flow artifact in the s3 bucket, and then save the output of `flow.serialize()` to store the JSON metadata representation of the Flow that is necessary to make the registration call with the API in the future.

3. This is handled by using unique filenames within your s3 bucket; if you use default Prefect settings for S3 storage, we typically use a timestamp so that different versions can be distinguished.  This S3 file location is stored in the backend to distinguish between the versions.  In particular, if you want to update one of your tasks without re-registering, you will need to ensure you store the updated Flow in the exact same location that the API is aware of.

4. I’m actually not 100% sure what you mean here; labels will ensure that only certain agents run your Flow, but whether the agent is truly compatible with your Flow is ultimately user responsibility.

There are two additional things worth mentioning based on your questions:
- you will have to do some heavy lifting to deploy the open source API in the way you describe; specifically, you’ll need to make sure all agents / CI jobs can communicate with your self-hosted API.  This is one of the major benefits of the Cloud API is that is accessible from anywhere (with an auth token)
- the Flow artifact that is ultimately stored in the S3 bucket does _not_ contain all possible dependencies of your Flow.  You’ll need to make sure that any imports the Flow relies on are available at runtime; the most common way to achieve this is to run your flow within a Docker image that has all files / packages on the import path