Hi everyone I m struggling understanding how prefect behaves Prefect Community #ask-community

Hi everyone — I’m struggling understanding how pre...

Bruno Murino

12/02/2021, 9:43 AM

Hi everyone — I’m struggling understanding how prefect behaves when I redeploy stuff. The way it works now is that 1 git repo = 1 prefect Project containing multiple flows. We have a dockerfile whose endpoint simply registers all the flows and then starts a LocalAgent. This is deployed as an ECS Service. When we deploy, we simply kill the existing service and start a new one with a new docker image — this means there is a little of “downtime” in terms of flow schedules and etc, like: • Time is 4.59 • We start a new deployment, meaning that at 4.59 the service is killed • There is a job scheduled to run at 5.00. • The service is only back up at 5.01. • This mean the 5.00 run will be fully missed. (please correct me if I’m wrong) My question, however, is about a second scenario, where we: • start run at 4.58, run takes about 5 minutes to finish, • We start new deployment at 4.59 — meaning there is a run in-progress • What happens with the flow run, if the entire service (local agent + code execution environment) is killed? • at 5.01 when the new deployment finishes, will prefect know to resume that flow run? how would it work? Reason I’m asking is because I plan on changing our deployment strategy to be blue/green, but I’m not sure how prefect will cope with a flow run being killed midway and etc Sorry if this is confusing! I appreciate any help.

Anna Geller

12/02/2021, 10:26 AM

@Bruno Murino you don't have to kill existing flow runs or agents when you deploy a new version of a flow. When your flow changes, you register a new version of it so that the next time you run it, you trigger a new version of the flow. This way you can entirely avoid any downtime related to new flow version deployment.

Bruno Murino

12/02/2021, 10:31 AM

I’m struggling with understanding this because I’m using a LocalRun with a LocalAgent and LocalStorage — so the only way to change the flow code is by deploying a new docker image, and because I’m using a LocalAgent, this means I need the agent to be in that docker image as well

Anna Geller

12/02/2021, 10:35 AM

@Bruno Murino actually, with Local agent and storage, you don’t need to use Docker at all. You could deploy your agent e.g. in a virtual environment for isolation, but normally local agent is deployed as a local process without Docker. But there is also a Docker agent - perhaps this is what you’re looking for?

Anna Geller

12/02/2021, 10:35 AM

when it comes to flow deployment patterns, some users shared how they do it in this Github discussion - sharing in case this might be interesting for you https://github.com/PrefectHQ/prefect/discussions/4042

👀 1

Bruno Murino

12/02/2021, 10:36 AM

we need docker because we deploy as an ECS Service

Anna Geller

12/02/2021, 10:41 AM

Do you want your flows to run on ECS too? We have an ECS agent: https://docs.prefect.io/orchestration/agents/ecs.html I could share a tutorial on how to set it up if you’re interested

Bruno Murino

12/02/2021, 10:42 AM

I have tried that but it was too slow to start the flow run

Bruno Murino

12/02/2021, 10:44 AM

maybe I should give it a second try though

Anna Geller

12/02/2021, 10:49 AM

I see, I can actually totally understand that 🙂 the ECS with Fargate is really not the fastest option to start because the Serverless data plane needs to first provision the compute resources and pull the image before it can run the flow. But you could solve it by using EC2 instead of Fargate capacity provider.

Bruno Murino

12/02/2021, 10:51 AM

we do use EC2 😂

Anna Geller

12/02/2021, 10:53 AM

Really? 😄 that’s weird. With EC2 data plane it should start up pretty much instantaneously. Could it be that the capacity provider was still set to Fargate?

Bruno Murino

12/02/2021, 10:54 AM

don’t think so — we don’t have anything on fargate and I had to deal with our VPC and stuff, so it was definitely EC2

👍 1

Anna Geller

12/02/2021, 10:54 AM

Anyway, in case you would want to go that path, you could use ECS CLI to spin up the cluster and then follow this or this to set up an agent as ECS service with the exception to change the capacity provider from FARGATE to EC2.

Bruno Murino

12/02/2021, 10:55 AM

there might had been a problem with the ECS agent we had — it was deployed as an ECS Service and it would get an OOM error every 3 days or so, with no apaprent reason

Anna Geller

12/02/2021, 10:55 AM

hmm it never happened to me. Perhaps you could allocate more memory to the agent then?

Bruno Murino

12/02/2021, 10:55 AM

yea I think I’ll give that a try

Bruno Murino

12/02/2021, 10:56 AM

didn’t seem to be the issue, the memory profile was a steady increase at all times, not related to any process

Anna Geller

12/02/2021, 10:56 AM

usually the agent doesn’t need a lot of memory because it doesn’t do any work by itself, it only spins up flows as ECS tasks

Bruno Murino

12/02/2021, 10:57 AM

exactly, so I thought it was a bug at the time or something

Bruno Murino

12/02/2021, 10:58 AM

I think it’s worth trying that approach again, with newer versions and etc

Bruno Murino

12/02/2021, 10:58 AM

as it does look like deploying as ecs task might solve the downtime issue

🙌 1

Bruno Murino

12/02/2021, 3:55 PM

@Anna Geller I’m setting up some flows with ECS Run and something seems a bit odd — the task definition (visible in AWS console) contains the api key to prefect cloud — is there any way to avoid that?

Anna Geller

12/02/2021, 4:02 PM

yes, there is! There are two ways to store credentials that are retrieved by ECS tasks: 1. AWS Parameter Store 2. AWS Secrets Manager The blog post I linked handles that and includes a section on how to set it up using #1. You can also check out #3 in this blog https://aws.plainenglish.io/8-common-mistakes-when-using-aws-ecs-to-manage-containers-3943402e8e59?sk=334f367ff27d3fe9b56ff31f8b9ba447

Bruno Murino

12/02/2021, 4:04 PM

I do use AWS secrets manager on ECS task definition that I create — but now Prefect is creating them and adding a bunch of environment variables

Bruno Murino

12/02/2021, 4:07 PM

like this

Anna Geller

12/02/2021, 4:07 PM

no no, Prefect will use the value from parameter or Secret - have a look at this section:

Bruno Murino

12/02/2021, 4:09 PM

this is what I have setup:

Anna Geller

12/02/2021, 4:09 PM

but then you didn’t follow this tutorial, right? 😄 if you were, it would look like this - it must be an ARN to the Secret or Parameter

Bruno Murino

12/02/2021, 4:14 PM

apologies — what tutorial are you referring to? the link you sent doesn’t mention Prefect at any point

Anna Geller

12/02/2021, 4:16 PM

I mentioned it here: https://prefect-community.slack.com/archives/CL09KU1K7/p1638442488159300?thread_ts=1638438180.148200&cid=CL09KU1K7 This is the ECS Agent walkthrough: https://towardsdatascience.com/how-to-cut-your-aws-ecs-costs-with-fargate-spot-and-prefect-1a1ba5d2e2df

Anna Geller

12/02/2021, 4:16 PM

and there is a more compact version in the docs: https://docs.prefect.io/orchestration/agents/ecs.html#running-ecs-agent-in-production

Bruno Murino

12/02/2021, 4:24 PM

on the link you sent the task definition is created for the ECS Agent — which is fine — but my issue with on the task definition for flows

Bruno Murino

12/02/2021, 4:24 PM

btw I really do appreciate your help!

🙌 1

Bruno Murino

12/02/2021, 4:30 PM

also to clarity — the task definition for the Flows is also fine — is when the flow runs, the ECS task runs — then you can go to the ECS task run and on the AWS console it lists a bunch of env vars that were not setup by my code or myself, but were set by the Prefect ECS Agent, I believe

Bruno Murino

12/02/2021, 4:39 PM

this is the task definition created when I submit a run of a flow registered with a run_config of ECSRun

Bruno Murino

12/02/2021, 4:39 PM

and this is the task details, for a task that used the task-definition above

Anna Geller

12/02/2021, 5:08 PM

nice! so it looks like you git it working? LMK if you have any open questions.

Bruno Murino

12/02/2021, 5:08 PM

well not really! haha sorry for the confusion

Bruno Murino

12/02/2021, 5:08 PM

let me rephrase

Bruno Murino

12/02/2021, 5:10 PM

when the ECS Agent instantiates an ECS Task, I can check out the ECS Task run in the AWS console and it there are a bunch of env vars showing, which include the env var with the api key

Bruno Murino

12/02/2021, 5:10 PM

which is demonstrated by the last screenshot I sent

Bruno Murino

12/02/2021, 5:11 PM

to clarify, the task does run fine — my question is just about security concerns and etc

Anna Geller

12/02/2021, 5:15 PM

@Bruno Murino I see. The only way this may happen is if your ECSRun is using a different task definition than your agent. If you don’t set it explicitly, the one from the agent will be used. I’ve recently updated the docstring in the ECSRun to include more examples - perhaps you can reference the same task definition ARN as the one used by your agent?

Bruno Murino

12/02/2021, 5:17 PM

I’m afraid the task run of the flow has to be different from the task definition of the agent — mainly because they run different images

Bruno Murino

12/02/2021, 5:19 PM

let me try something

Bruno Murino

12/02/2021, 5:51 PM

no luck — I was hoping I’d be able to avoid passing a full task definition and just passing the details through other arguments, but the lack of a “networkMode” argument makes me have to use a custom task definition from scratch

Bruno Murino

12/02/2021, 5:52 PM

also, I’m not sure that would have solved — I suspect what Prefect does is a “container overrides” to inject all envs vars it requires — and that bubbles up to the AWS console

Bruno Murino

12/02/2021, 5:53 PM

because the actual task definition is as I coded when registering the flow

Bruno Murino

12/02/2021, 6:17 PM

I’m wondering if I can pass some “container overrides” via the

run_task_kwargs

to set the api key variables to be fetched from aws secrets/parameter store

Bruno Murino

12/02/2021, 6:21 PM

ah ok haha

Anna Geller

12/02/2021, 6:22 PM

network mode belongs to the task definition, but the exact network details such as VPC id and subnet id belong to run task

Bruno Murino

12/02/2021, 6:22 PM

well I need “networkMode = bridge” anyway haha

Anna Geller

12/02/2021, 6:25 PM

true, for those you don’t need VPC details, correct

Bruno Murino

12/02/2021, 6:31 PM

do you mind commenting on my assumption “I suspect what Prefect does is a “container overrides” to inject all envs vars it requires — and that bubbles up to the AWS console”

Anna Geller

12/02/2021, 7:14 PM

correct, if you set something explicitly on ECSRun, it will serve as overrides to what was configured on the agent. You can see more here: https://github.com/PrefectHQ/prefect/blob/d44b72a950ebda9f7bc6a9712fc71e2e9c680d25/src/prefect/agent/ecs/agent.py#L444-L475

Bruno Murino

12/02/2021, 11:54 PM

Nice! Thanks for showing me! From what I gather, if I set a container overrides on some env vars it will get ignored, however since the overrides I want to place are “secrets” instead of “environment”, it should still be applied. The bit I need to test is if some env var is in both the “environment” section and the “secrets” section, then which way takes precedence

Bruno Murino

12/02/2021, 11:59 PM

Do you think it's worth trying to contribute to that part of prefect? Goal would be to accept an aws secret/parameter store arn as part of the ecs run config, so that nowhere in the AWS console the api key is viaible

4 Views

Open in Slack

Previous Next