Alexander

    Alexander

    1 year ago
    Heya, community 🖖. I've been poking prefect for couple of weeks already, trying to understand how it can be used in production environment. I like the almost cloud native support via docker. But it has its quircks though. The most difficult part in setupping production CI process with prefect is flows registration. I just dont get it. It works nice in a local environment when you run prefect server and agents locally from the same python venv and your code is in one place.1. To register flow, you have to actually execute python file. This means your flow registration environment must be equal to your flow production execution environment. Which gives you no choice but use docker for your production environment. With some CI which do not support docker-in-docker, this makes everything harder. 2. If you have many flows, you have to register all flows one by one, you need to write some script which will register all flows in a folder or maintain a singe script where all flows are registered which needs to maintain. I need to write considerable amount of code to maintain more than 1 flow. 3. Local agent is just not enough for production. If you use LocalAgent, it must run it in flow production environment. If you update flow production environment (added new dependency), you need to restart local agent. But you cant because it may be executing some tasks. 4. Docker agent, this is my favourite. It has its own quircks. For example, i was extremely surprized when i found that it will override some settings in task execution container with its own settings (like logging level). Other thing is again a multi-flow registration. You either have distinct docker storages in every flow object which means 100 flows 100 docker images built, or you have one docker storage for all flows, which means again you must have central flow registration script which will create storage, assign flows to it, build it, then assign storage to flows, then register them. And you need to write this script by yourself. 5. Every time you register you bump a new flow version in UI. If you dont want that, you need to come up with some checks or hash comparisons to undestand if flow is changed and you need a register or not. Again you need to do it yourself. I was able to solve all problems by coming up with this workflow: 1. Build flow production environment docker image 2.
    docker run
    this image and call flows registration script (written myself) 3. In this script, i iterate over all python scripts in flows folder; import this scripts instead of exec approach used in extract_flow_from_file or whatever; put its flow object into a list 4. Create docker storage with desired settings - it uses same production environment dockerfile which used in step #1; add all flows to this storage 5. Build this storage 6. Assign built storage object to all flows 7. Register all flows. I am lucky that all flows are in the same project and have same registration settings (for now). It will be painful to come up with approach to how do per-flow registration customization in such generic script All this required significant experimentation, prefect source code reading (it is magnificent, no jokes. I had a real pleasure reading it). I wish there were best practices put in prefect docs about flow registration and production CI setup. I was curious what are best practices in your prefect community for production flows registration? What your best choice of running tasks? How do you deliver flows source code to prod?
    Dylan

    Dylan

    1 year ago
    Hey @Alexander! Thank you for your feedback. We are actually talking internally now about an [idiom](https://docs.prefect.io/core/idioms/idioms.html) for registering flows in CI. If you’d be willing to share your setup, we’d love to take a look and see how you did it!
    That’s also a really interesting suggested feature (auto-registration of flows in a directory)
    Would you mind opening a feature request on the Prefect repo?
    And we would definitely welcome contributions on this front!
    If you’d like to talk more specifically about implementation, head over to #prefect-contributors
    🙌
    s

    Sven Teresniak

    1 year ago
    We use K8S to manage a prefect cluster (with LocalAgent, Dask Cluster/Executor). For PROD we have a persistent volume for Postgres. For Stage (CI, Testing, …) we don't persist state between "cluster runs/instances". For reasons I cannot control its not allowed for Prefect to a) use docker (e.g. spawn containers) or b) use K8S (e.g. spawn jobs, etc.). For privacy (and other) reasons its not possible to use Prefect Cloud. As described by @Alexander each deployment inceases the flow's version number as we use a script to register every flow in a certain director on startup. The increased flow version is not a real problem, because the flows are part of a docker image built by jenkins. That is, every release from git leads to a image (containing the flows and the image's tag is the important thing for us. A flow's version number in Prefect completely lacks the connection to a commit in e.g. git as I see it. Therefore the version number is of limited value to us. A nice feature (like in nice-to-have but not necessary) for us would be to be able to CHANGE the version of a flow to a string of our choosing (this would be the docker image tag set by CI). We could workaround this by injecting this image/git based version information using tags or labels (using envsubst on startup etc.) but this seems ugly. For development and testing I conntected our Jupyter notebook with another small (Dask-based) Prefect cluster. Its now possible to register flows from Jupyter which is helpful while writing one-time (fire and forget) ETL jobs (on-demand analysis, data exploration, transformation, migration, etc.) But this is another story.