https://prefect.io logo
Migrating from the DockerRun in Prefect 1 to DockerContainer in Prefect 2
t

Thomas Pedersen

10/20/2022, 7:27 AM
Trying to migrate to Prefect 2 using the flows defined in Docker infrastructure images (https://www.prefect.io/guide/blog/prefect-2-3-0-adds-support-for-flows-defined-in-docker-images-and-github/). In Prefect 1, we could tie the flows in each release to a specific Docker image version by setting the version in DockerRun(image="{image}:{version}"). That way we were certain that the flow were always running in the intended image. In Prefect 2: 1. Not setting the version in the infrastructure block, makes each flow use the 'latest' version. As far as I understand this means that a flow could potentially run on an older version if the images haven't been updated on the agent. In our old setup this would have cause a flow failure - which would be far better. 2. Setting the image version explicitly in the infrastructure block would mean that I have to create a new infrastructure block for every release we do. Am I missing any smarter ways to control versioning of flows and infrastructure?
1
a

Anna Geller

10/20/2022, 12:55 PM
a flow could potentially run on an older version if the images haven't been updated on the agent
the agent doesn't need to know about the image -- you would typically update the image tag from your CI/CD so regarding #2 it shouldn't necessarily require you to create a new block, but update the image tag on the existing block; you could also leverage versioning on deployments + --infra-overrides to do that easily:
prefect deployment build flows/healthcheck.py:healthcheck --name xxx -q default -sb s3/default -ib docker-container/default --override image=new_image_tag
t

Thomas Pedersen

10/21/2022, 8:54 AM
Yea, that might be a good solution, trying it ... Is there a way to tell the deployment that the flows have been installed with pip on the docker image, or is it just about setting path to something like /usr/local/lib/python3.9/site-packages/ ?
a

Anna Geller

10/21/2022, 11:28 AM
It's about the path -- prefect needs to find entrypoint to the flow script so if you would install as a module and somehow remove the package files, you would need to point to the Python script with with flow as a path to site packages
t

Thomas Pedersen

10/24/2022, 2:20 PM
Think I got that part to work .. have a question on the mount settings though. In Prefect 1, we were able to control read/write mode of each mount, and able to set memory/swap limits on the running dockers - we had some issues with agents crashing due to out-of-memory issues and would far prefer that the flow fails over the entire server crashing:
run_config = DockerRun(
            host_config={
                "binds": {
                    f"/etc/preheat/{branch}": {
                        "bind": "/etc/preheat",
                        "mode": "ro",
                    },
                    f"/var/cache/preheat/{branch}": {
                        "bind": "/var/cache/preheat",
                        "mode": "rw",
                    },
                },
                "mem_limit": _RAM_LIMIT_DOCKER,
                "memswap_limit": _RAM_LIMIT_DOCKER,
            },
In Prefect 2, this no longer seems possible?
a

Anna Geller

10/24/2022, 2:41 PM
could you explain why your Orion server crashes when your flow run runs out of memory? this should never happen, flow runs using DockerContainer infra block are fully isolated from the Orion API server, and if API server maintenance is too difficult, you can switch to Prefect Cloud - there is an always free tier https://app.prefect.cloud/
t

Thomas Pedersen

10/25/2022, 5:08 AM
Not the Orion server, but the (virtual) server running the Prefect 1 Agent - sorry if that wasn't clear. It's a long time ago we had the issue, so can't remember what exactly happened. But, probably something with the OS running out of memory and killing random processes. It led to us to setting memory limits on the docker containers - that way docker would kill the container before the OS start killing processes. Result was that only the flow run was killed, leaving the Agent up and running. We constantly develop new flows and improve old ones, so if bugs should occur it's nice to know that they don't prevent critical control systems in running. Yea, we have multiple agents - but only a few, so taking them out one-by-one isn't really that convenient :)