Heya, community 🖖. I've been poking prefect for couple of weeks already, trying to understand how it can be used in production environment. I like the almost cloud native support via docker. But it has its quircks though. The most difficult part in setupping production CI process with prefect is flows registration.
I just dont get it. It works nice in a local environment when you run prefect server and agents locally from the same python venv and your code is in one place.
1. To register flow, you have to actually execute python file. This means your flow registration environment must be equal to your flow production execution environment. Which gives you no choice but use docker for your production environment. With some CI which do not support docker-in-docker, this makes everything harder.
2. If you have many flows, you have to register all flows one by one, you need to write some script which will register all flows in a folder or maintain a singe script where all flows are registered which needs to maintain. I need to write considerable amount of code to maintain more than 1 flow.
3. Local agent is just not enough for production. If you use LocalAgent, it must run it in flow production environment. If you update flow production environment (added new dependency), you need to restart local agent. But you cant because it may be executing some tasks.
4. Docker agent, this is my favourite. It has its own quircks. For example, i was extremely surprized when i found that it will override some settings in task execution container with its own settings (like logging level). Other thing is again a multi-flow registration. You either have distinct docker storages in every flow object which means 100 flows 100 docker images built, or you have one docker storage for all flows, which means again you must have central flow registration script which will create storage, assign flows to it, build it, then assign storage to flows, then register them. And you need to write this script by yourself.
5. Every time you register you bump a new flow version in UI. If you dont want that, you need to come up with some checks or hash comparisons to undestand if flow is changed and you need a register or not. Again you need to do it yourself.
I was able to solve all problems by coming up with this workflow:
1. Build flow production environment docker image
2.
docker run
this image and call flows registration script (written myself)
3. In this script, i iterate over all python scripts in flows folder;
import this scripts instead of exec approach used in extract_flow_from_file or whatever; put its flow object into a list
4. Create docker storage with desired settings - it uses same production environment dockerfile which used in step #1; add all flows to this storage
5. Build this storage
6. Assign built storage object to all flows
7. Register all flows. I am lucky that all flows are in the same project and have same registration settings (for now). It will be painful to come up with approach to how do per-flow registration customization in such generic script
All this required significant experimentation, prefect source code reading (it is magnificent, no jokes. I had a real pleasure reading it). I wish there were best practices put in prefect docs about flow registration and production CI setup.
I was curious what are best practices in your prefect community for production flows registration? What your best choice of running tasks? How do you deliver flows source code to prod?