hello prefect team, hello community, i was wondering if there are any established best practices fo...
j
hello prefect team, hello community, i was wondering if there are any established best practices for keeping prefect agents/workers and existing deployments compatible. we have quite a number of existing deployments that are working fine and do not require any active development and therefore haven't received any dependency updates in a while. at the same time, we want to make use of the latest features of prefect workers and therefore update our workers to the latest prefect version. this leads to issues if old deployments, that are not executed with the local task runner but with task runners that execute flows in a different environment than the agent (e.g. ray task runner), have an older prefect version installed. for example, prefect 2.14.5 changed the default command for from
prefect -m prefect.engine
to
prefect flow-run execute
, with the consequence that an agent on the newest version of prefect cannot start runs in any infrastructure that has prefect<2.12.0 (the version when
flow-run execute
was introduced) installed. while this specific problem can be solved by overwriting the command, the underlying incompatibility issue can only be solved in general by either redeploying the old deployments with the latest prefect version or setting up a new work pool with a new worker. updating all older flow is quite cumbersome in our organization as responsibilities for individual flows lie withing different teams. running multiple agents for multiple prefect versions adds additional costs and complexity to creating the deployments. is there maybe another solution to it that i am not seeing currently? what is prefect's general strategy in keeping agents or workers and the rest of the code base compatible across minor versions? can we be confident that future deployments with future versions of prefect will still be able to run with today's workers with the same major version? happy to receive any feedback.
k
Hi Justin, I think broadly it's good practice to keep versions between workers and flow run environments in sync, though as you mentioned it isn't necessarily a requirement. As major versions drift further apart, the likelihood of issues like the one you're experiencing occurring increases. If the teams that manage flow development are using docker images built on top of the Prefect base image to manage dependencies, I could see something like this working: • Team that manages worker upgrades identifies a new Prefect release which includes a fix or upgrade to worker behavior that merits an upgrade, and communicates advance notice within the organization that an upgrade to that version will occur. • Teams that manage flows and deployments have a window of time to swap out the Prefect version number in the
FROM
part of the Dockerfile and build and push their images. • When the worker upgrade occurs, work pools and/or deployments are also updated to use the newly built images, perhaps during a window when fewer flows are running. • Optionally, if your organization has multiple environments like dev and staging, roll out these changes one environment at a time to ensure compatibility. It's true that new Prefect releases are frequent, and version and compatibility management is one of the less fun things to worry about, so ensuring that upgrades only happen when they really need to is the best way to reduce overhead.
j
hello Kevin, thank you for your detailed response. i think the hardest part about your proposed solution is the decision on when a new feature is worth an upgrade. as you said, prefect is frequently releasing and oftentimes, a release would contain a feature that would not affect the agent or workers but would be very useful for one particular flow. This means that if developers want to make use of a new prefect feature they would need to ask the platform team for a worker update, which slows down the process quite a bit. currently we are allowing developers to use whatever prefect version they want and update the agents/workers from time to time. this works most of the time but is somewhat risky because it is hard to tell which changes could impact the deployment/worker compatibility. do you think it is realistic to get some sort of compatibility indication from prefect about which versions can be combined in workers and deployments? this would make the decision about when to update easier, keep up the development speed and minimize interruptions due to incompatible versions.