Hi everyone — I’m struggling understanding how pre...
# ask-community
b
Hi everyone — I’m struggling understanding how prefect behaves when I redeploy stuff. The way it works now is that 1 git repo = 1 prefect Project containing multiple flows. We have a dockerfile whose endpoint simply registers all the flows and then starts a LocalAgent. This is deployed as an ECS Service. When we deploy, we simply kill the existing service and start a new one with a new docker image — this means there is a little of “downtime” in terms of flow schedules and etc, like: • Time is 4.59 • We start a new deployment, meaning that at 4.59 the service is killed • There is a job scheduled to run at 5.00. • The service is only back up at 5.01. • This mean the 5.00 run will be fully missed. (please correct me if I’m wrong) My question, however, is about a second scenario, where we: • start run at 4.58, run takes about 5 minutes to finish, • We start new deployment at 4.59 — meaning there is a run in-progress • What happens with the flow run, if the entire service (local agent + code execution environment) is killed? • at 5.01 when the new deployment finishes, will prefect know to resume that flow run? how would it work? Reason I’m asking is because I plan on changing our deployment strategy to be blue/green, but I’m not sure how prefect will cope with a flow run being killed midway and etc Sorry if this is confusing! I appreciate any help.
a
@Bruno Murino you don't have to kill existing flow runs or agents when you deploy a new version of a flow. When your flow changes, you register a new version of it so that the next time you run it, you trigger a new version of the flow. This way you can entirely avoid any downtime related to new flow version deployment.
b
I’m struggling with understanding this because I’m using a LocalRun with a LocalAgent and LocalStorage — so the only way to change the flow code is by deploying a new docker image, and because I’m using a LocalAgent, this means I need the agent to be in that docker image as well
a
@Bruno Murino actually, with Local agent and storage, you don’t need to use Docker at all. You could deploy your agent e.g. in a virtual environment for isolation, but normally local agent is deployed as a local process without Docker. But there is also a Docker agent - perhaps this is what you’re looking for?
when it comes to flow deployment patterns, some users shared how they do it in this Github discussion - sharing in case this might be interesting for you https://github.com/PrefectHQ/prefect/discussions/4042
👀 1
b
we need docker because we deploy as an ECS Service
a
Do you want your flows to run on ECS too? We have an ECS agent: https://docs.prefect.io/orchestration/agents/ecs.html I could share a tutorial on how to set it up if you’re interested
b
I have tried that but it was too slow to start the flow run
maybe I should give it a second try though
a
I see, I can actually totally understand that 🙂 the ECS with Fargate is really not the fastest option to start because the Serverless data plane needs to first provision the compute resources and pull the image before it can run the flow. But you could solve it by using EC2 instead of Fargate capacity provider.
b
we do use EC2 😂
a
Really? 😄 that’s weird. With EC2 data plane it should start up pretty much instantaneously. Could it be that the capacity provider was still set to Fargate?
b
don’t think so — we don’t have anything on fargate and I had to deal with our VPC and stuff, so it was definitely EC2
👍 1
a
Anyway, in case you would want to go that path, you could use ECS CLI to spin up the cluster and then follow this or this to set up an agent as ECS service with the exception to change the capacity provider from FARGATE to EC2.
b
there might had been a problem with the ECS agent we had — it was deployed as an ECS Service and it would get an OOM error every 3 days or so, with no apaprent reason
a
hmm it never happened to me. Perhaps you could allocate more memory to the agent then?
b
yea I think I’ll give that a try
didn’t seem to be the issue, the memory profile was a steady increase at all times, not related to any process
a
usually the agent doesn’t need a lot of memory because it doesn’t do any work by itself, it only spins up flows as ECS tasks
b
exactly, so I thought it was a bug at the time or something
I think it’s worth trying that approach again, with newer versions and etc
as it does look like deploying as ecs task might solve the downtime issue
🙌 1
@Anna Geller I’m setting up some flows with ECS Run and something seems a bit odd — the task definition (visible in AWS console) contains the api key to prefect cloud — is there any way to avoid that?
a
yes, there is! There are two ways to store credentials that are retrieved by ECS tasks: 1. AWS Parameter Store 2. AWS Secrets Manager The blog post I linked handles that and includes a section on how to set it up using #1. You can also check out #3 in this blog https://aws.plainenglish.io/8-common-mistakes-when-using-aws-ecs-to-manage-containers-3943402e8e59?sk=334f367ff27d3fe9b56ff31f8b9ba447
b
I do use AWS secrets manager on ECS task definition that I create — but now Prefect is creating them and adding a bunch of environment variables
like this
a
no no, Prefect will use the value from parameter or Secret - have a look at this section:
b
this is what I have setup:
a
but then you didn’t follow this tutorial, right? 😄 if you were, it would look like this - it must be an ARN to the Secret or Parameter
b
apologies — what tutorial are you referring to? the link you sent doesn’t mention Prefect at any point
b
on the link you sent the task definition is created for the ECS Agent — which is fine — but my issue with on the task definition for flows
btw I really do appreciate your help!
🙌 1
also to clarity — the task definition for the Flows is also fine — is when the flow runs, the ECS task runs — then you can go to the ECS task run and on the AWS console it lists a bunch of env vars that were not setup by my code or myself, but were set by the Prefect ECS Agent, I believe
this is the task definition created when I submit a run of a flow registered with a run_config of ECSRun
and this is the task details, for a task that used the task-definition above
a
nice! so it looks like you git it working? LMK if you have any open questions.
b
well not really! haha sorry for the confusion
let me rephrase
when the ECS Agent instantiates an ECS Task, I can check out the ECS Task run in the AWS console and it there are a bunch of env vars showing, which include the env var with the api key
which is demonstrated by the last screenshot I sent
to clarify, the task does run fine — my question is just about security concerns and etc
a
@Bruno Murino I see. The only way this may happen is if your ECSRun is using a different task definition than your agent. If you don’t set it explicitly, the one from the agent will be used. I’ve recently updated the docstring in the ECSRun to include more examples - perhaps you can reference the same task definition ARN as the one used by your agent?
b
I’m afraid the task run of the flow has to be different from the task definition of the agent — mainly because they run different images
let me try something
no luck — I was hoping I’d be able to avoid passing a full task definition and just passing the details through other arguments, but the lack of a “networkMode” argument makes me have to use a custom task definition from scratch
also, I’m not sure that would have solved — I suspect what Prefect does is a “container overrides” to inject all envs vars it requires — and that bubbles up to the AWS console
because the actual task definition is as I coded when registering the flow
I’m wondering if I can pass some “container overrides” via the
run_task_kwargs
to set the api key variables to be fetched from aws secrets/parameter store
ah ok haha
a
network mode belongs to the task definition, but the exact network details such as VPC id and subnet id belong to run task
b
well I need “networkMode = bridge” anyway haha
a
true, for those you don’t need VPC details, correct
b
do you mind commenting on my assumption “I suspect what Prefect does is a “container overrides” to inject all envs vars it requires — and that bubbles up to the AWS console”
a
correct, if you set something explicitly on ECSRun, it will serve as overrides to what was configured on the agent. You can see more here: https://github.com/PrefectHQ/prefect/blob/d44b72a950ebda9f7bc6a9712fc71e2e9c680d25/src/prefect/agent/ecs/agent.py#L444-L475
b
Nice! Thanks for showing me! From what I gather, if I set a container overrides on some env vars it will get ignored, however since the overrides I want to place are “secrets” instead of “environment”, it should still be applied. The bit I need to test is if some env var is in both the “environment” section and the “secrets” section, then which way takes precedence
Do you think it's worth trying to contribute to that part of prefect? Goal would be to accept an aws secret/parameter store arn as part of the ecs run config, so that nowhere in the AWS console the api key is viaible