Hi, I'm looking to set up some Prefect-agents-as-a-service within my organisation. We're planning to...
d

David Evans

over 3 years ago
Hi, I'm looking to set up some Prefect-agents-as-a-service within my organisation. We're planning to use the hosted Prefect Cloud with some AWS ECS agents, and a GitHub Actions pipeline to publish new flows (probably using S3 as storage). This is all greenfield so we're pretty flexible on the details. Right now we're sticking to Prefect 1.x but if there's a good reason to jump to 2.x I don't think that would be a problem (as long as it's safe/secure we don't mind the occasional hiccough while it's still in beta) Most of that is fine; so far we're just using a local agent deployed on EC2, but I can see there's an ECS agent which we can presumably use easily enough to get any scalability that we'll need, and we can use the
prefect
CLI to push flows from GitHub Actions. But where we're hitting problems is with dependency management (both internal code which is shared between multiple tasks/flows, and external dependencies). From what I've seen, Prefect doesn't really support this at all (flows are expected to be self-contained single files), with the implication being that the agent itself has to have any shared dependencies pre-installed (which in our case would mean that any significant changes require re-building and re-deploying the agent image - a slow process and not very practical if we have long-lived tasks or multiple people testing different flows at the same time). I tried looking around for Python bundlers and found stickytape, but that seems a bit too rough-and-ready for any real use. This seems to be a bit of a known problem: 1, 2 and specifically I see:
V2 supports virtual and conda environment specification per flow run which should help some
And I found some documentation for this (which seems to tie it to the new concept of deployments), but I'm still a bit confused on the details: • would the idea be to create a deployment for every version of every flow we push? Will we need to somehow tidy up the old deployments ourselves? • can deployments be given other internal files (i.e. common internal code), or is it limited to just external dependencies? Relatedly, do deployments live on the server or in the configured Storage? • is there any way to use zipapp bundles? • ideally we want engineers to be able to run flows in 3 ways: entirely locally; on a remote runner triggered from their local machine (with local code, including their latest local dependencies); and entirely remotely (pushed to the cloud server via an automated pipeline and triggered or scheduled - basically "push to production") — I'm not clear on how I should be thinking about deployments vs flows to make these 3 options a reality. I also wonder if I'm going down a complete rabbit hole and there is an easier way to do all of this?