David Evans
04/28/2022, 10:50 AMprefect
CLI to push flows from GitHub Actions.
But where we're hitting problems is with dependency management (both internal code which is shared between multiple tasks/flows, and external dependencies). From what I've seen, Prefect doesn't really support this at all (flows are expected to be self-contained single files), with the implication being that the agent itself has to have any shared dependencies pre-installed (which in our case would mean that any significant changes require re-building and re-deploying the agent image - a slow process and not very practical if we have long-lived tasks or multiple people testing different flows at the same time). I tried looking around for Python bundlers and found stickytape, but that seems a bit too rough-and-ready for any real use.
This seems to be a bit of a known problem: 1, 2 and specifically I see:
V2 supports virtual and conda environment specification per flow run which should help someAnd I found some documentation for this (which seems to tie it to the new concept of deployments), but I'm still a bit confused on the details: • would the idea be to create a deployment for every version of every flow we push? Will we need to somehow tidy up the old deployments ourselves? • can deployments be given other internal files (i.e. common internal code), or is it limited to just external dependencies? Relatedly, do deployments live on the server or in the configured Storage? • is there any way to use zipapp bundles? • ideally we want engineers to be able to run flows in 3 ways: entirely locally; on a remote runner triggered from their local machine (with local code, including their latest local dependencies); and entirely remotely (pushed to the cloud server via an automated pipeline and triggered or scheduled - basically "push to production") — I'm not clear on how I should be thinking about deployments vs flows to make these 3 options a reality. I also wonder if I'm going down a complete rabbit hole and there is an easier way to do all of this?
Anna Geller
if there's a good reason to jump to 2.x I don't think that would be a problem (as long as it's safe/secure we don't mind the occasional hiccough while it's still in beta)In that case, I would encourage you to start directly with Prefect 2.0. Re infrastructure: #1 What are your latency requirements? Are you OK with the extra latency introduced by serverless? If so, then EKS on Fargate may be a good option. This tutorial shows how to set this up using EKS and Cloud 2.0. The only difference is that you would use EKS on Fargate rather than EKS. To set up a cluster, you could use eksctl. If you don't like the latency of Fargate, you could use just EKS. This post discusses why we don't support ECS directly yet. Re code dependencies and agents: #2 Do you want to package your dependencies as docker containers or virtual environments? Prefect 2.0 agents are no longer tied to a specific infrastructure - there are no Kubernetes agents or docker agents. Instead, the same agent can deploy to any infrastructure depending on the flow runner attached to your DeploymentSpec. This post discusses it more. This means: you could have only one single agent deployed e.g. on EC2 and it could simultaneously deploy some flows to your EKS on Fargate cluster via KubernetesFlowRunner, or deploy them directly on the same instance as a subprocess running in conda env using
SubprocessFlowRunner
, or a container using DockerFlowRunner
David Evans
04/28/2022, 11:51 AMAnna Geller
would the idea be to create a deployment for every version of every flow we push?re versioning, the flow version is dynamic and you don't even have to recreate a deployment - you would need to only commit your code with the updated flow version and as long as the code dependencies didn't change (e.g. you added some tasks, but you still need the same pandas version), then no need to redeploy anything, only push the updated flow code to your storage example flow with a version:
@flow(name="My Example Flow",
description="An example flow for a tutorial.",
version="tutorial_02")
def my_flow():
# run tasks and subflows
to be transparent, we are actively working on the Storage and you will be able to set it on your DeploymentSpec
rather than globallycan deployments be given other internal files (i.e. common internal code), or is it limited to just external dependencies?you could copy those files into your Docker image e.g. when building the image in your CI/CD
run flows in 3 ways: entirely locally; on a remote runner triggered from their local machine (with local code, including their latest local dependencies); and entirely remotelyyou could manage that similarly to how Laura Lorenz showed that in the YouTube walkthrough I shared before - via tags and work queues