Hi all trying to figure out the best way to deploy by Prefec Prefect Community #prefect-server

Hi all, trying to figure out the best way to deplo...

Madison Schott

08/09/2021, 2:55 PM

Hi all, trying to figure out the best way to deploy by Prefect agent to AWS and came across this tutorial- has anyone followed this for their production deployments? What are the benefits of reading the flow code from S3 bucket? Thanks! https://towardsdatascience.com/serverless-data-pipelines-made-easy-with-prefect-and-aws-ecs-fargate-7e25bacb450c

Kevin Kho

08/09/2021, 3:05 PM

Hey @Madison Schott, the advantage of pulling it from somewhere is that execution is not confined to one machine. If you use Local storage, that flow can only run on that specific agent where it was registered but using something like S3 storage lets you run the flow anywhere as it will be downloaded. For ECSRun with S3 storage, you don’t have to re-build the container everytime you change the Flow. The Flow in S3 will just run on top of the container specified so it can be easier for development.

Madison Schott

08/09/2021, 3:07 PM

Got it! Thanks! And then what is usually easier to maintain ECS Fargate or EKS? I just discovered this tutorial as well https://towardsdatascience.com/distributed-data-pipelines-made-easy-with-aws-eks-and-prefect-106984923b30

Kevin Kho

08/09/2021, 3:13 PM

I think this is a “whatever you prefer” scenario. For example, some users with CI/CD pipelines choose Github storage because they have some registration steps after every commit. If you want everything coupled in the Docker container, you can put everything there. And then S3 if you want the separation between the Flow and the dependencies (in the Docker container). Does that make sense?

Madison Schott

08/09/2021, 3:18 PM

makes sense, thank you!

Madison Schott

08/09/2021, 9:36 PM

So I have an agent running and also have a flow with the same tags but my run is stuck in a submitted state... what's the reasoning behind this?

Kevin Kho

08/09/2021, 9:39 PM

This is on ECS right?

Madison Schott

08/09/2021, 9:44 PM

yup

Kevin Kho

08/09/2021, 9:45 PM

Can I see your RunConfig?

Madison Schott

08/09/2021, 9:46 PM

Copy code

RUN_CONFIG = ECSRun(run_task_kwargs={'cluster': 'prefect-prod'},
                    execution_role_arn='---',
                    labels=['ecs', 'test', 'hello', 'august'])

Kevin Kho

08/09/2021, 9:49 PM

I would say run the Flow in debug level logs and maybe something will pop up. It might be lack of resource requirements also. Maybe we’ll see where it is hanging

Madison Schott

08/09/2021, 9:53 PM

How do I run the debug level?

Kevin Kho

08/09/2021, 10:03 PM

In the UI, hit the Run button and then add an env variable

PREFECT___LOGGING___LEVEL

with value

DEBUG

or you can do it on agent spin up using the

--env

flag.

Madison Schott

08/09/2021, 10:41 PM

when I do that there are still no logs coming up, it just says it's scheduled

Kevin Kho

08/09/2021, 10:42 PM

What executor are you using? I have seen this before and it was permission related. Do you not see any logs on the Prefect UI?

Madison Schott

08/09/2021, 10:43 PM

I see logs but they are just saying that the process is being scheduled and rescheduled

Madison Schott

08/09/2021, 10:43 PM

I was running using the UI

Kevin Kho

08/09/2021, 11:21 PM

Could you try adding the env variable to the run config like

ECSRun(env={"PREFECT___LOGGING___LEVEL": "DEBUG"}

and then re-register and run?

Kevin Kho

08/09/2021, 11:30 PM

Or you can use

prefect agent ecs start --env PREFECT___LOGGING___LEVEL=DEBUG

Madison Schott

08/09/2021, 11:45 PM

did the first and the same thing is still happening with no logs

Kevin Kho

08/09/2021, 11:48 PM

Can I have a flow run id?

Madison Schott

08/09/2021, 11:49 PM

3b874891-5938-4e19-af8b-213341e5cc78

Kevin Kho

08/09/2021, 11:57 PM

Anything on the ECS side with Cloud Watch logs?

Madison Schott

08/09/2021, 11:59 PM

nope nothing

Madison Schott

08/09/2021, 11:59 PM

doesn't look like there's any services on the cluster

Kevin Kho

08/10/2021, 12:07 AM

How is your agent configured?

Kevin Kho

08/10/2021, 12:12 AM

Can you try adding `--log-level DEBUG`so that we can see if the agent will give more logs? But could you also show me how you start it?

Madison Schott

08/10/2021, 2:37 PM

I just have the agent started like this in a python file

Copy code

AGENT = ECSAgent(cluster="prefect-prod", labels=['ecs', 'test', 'hello', 'august'])

AGENT.start()

Madison Schott

08/10/2021, 2:37 PM

Can I add an env to this for logging?

Kevin Kho

08/10/2021, 2:55 PM

Use the

env_vars={}

argument of the ECS Agent

Madison Schott

08/10/2021, 2:59 PM

did that and I'm still getting nothing

Kevin Kho

08/10/2021, 3:00 PM

Nothing on the agent logs as well? Not the Flow logs in the UI?

Madison Schott

08/10/2021, 3:00 PM

The role was the only thing I changed- am I missing something

arn:aws:iam::xxxxxxx:role/xxx

Kevin Kho

08/10/2021, 3:02 PM

Not exactly sure but just a couple of thoughts. Maybe you can try adding cpu and memory values to the RunConfig? And then maybe you need to authenticate to get the image from ECR? And then lastly check permissions to confirm you have everything?

Madison Schott

08/10/2021, 3:03 PM

it started the image just fine when I registered the flow, I logged in and everything to authenticate

Madison Schott

08/10/2021, 3:03 PM

I had a permissions issue before which is why I was testing this but an error came up before and not it's just stuck in submitted

Kevin Kho

08/10/2021, 3:05 PM

Will ask the team for ideas

Madison Schott

08/10/2021, 3:07 PM

Ok thanks, cause the only thing I changed was the role, unless the error was just blocking it from even getting it to the point it's at now

Kevin Kho

08/10/2021, 3:39 PM

Does using another role work? Or this was the same flow you had previously?

J. Martins

08/10/2021, 4:04 PM

I have had similar behaviour in the past and if I remember it was always something to do with missing role permissions. I think I found it when looking in stopped tasks in the cluster and was able to see some error message… Hope this helps

👍 1

Madison Schott

08/10/2021, 4:19 PM

Awesome! I just found an error message, thank you!

Madison Schott

08/10/2021, 4:20 PM

Copy code

ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 1 time(s): AccessDeniedException: User:

Madison Schott

08/10/2021, 4:20 PM

Do you know what permissions this would be?

Kevin Kho

08/10/2021, 4:22 PM

Ah finally 😅. Thank you @J. Martins! This is about pulling the container. So to authenticate you normally do something like this:

Copy code

aws ecr get-login-password --region <REGION> | docker login -- username AWS --password-stdin <ACCOUNT>.dkr.ecr.<REGION>.<http://amazonaws.com|amazonaws.com>

Is your ecs agent running as a service on AWS too?

Madison Schott

08/10/2021, 4:23 PM

found this thread : https://stackoverflow.com/questions/61265108/aws-ecs-fargate-resourceinitializationerror-unable-to-pull-secrets-or-registry but not sure if anyone here has had this same issue when using fargate with prefect

Kevin Kho

08/10/2021, 4:24 PM

Oh wow that looks way different than what I suggested. You’re probably right and no I haven’t seen it before here.

Madison Schott

08/10/2021, 4:24 PM

I did do that before running

Mariia Kerimova

08/10/2021, 4:28 PM

Hmm, interesting. For interacting with CloudWatch and ECR you need to provide execution role which looks like you did, can you double check that that role has right permissions? Or can you attach this policy

AmazonECSTaskExecutionRolePolicy

Madison Schott

08/11/2021, 2:40 PM

Got it to work- there were some more permissions we had to add- now I am getting ab error with my dbt task saying it can't fidn the dbt.yml, do I need to download this on my Docker container? Not sure how that works

Kevin Kho

08/11/2021, 2:42 PM

Yeah for that you would need all of the dbt dependencies inside the Docker container and then you would point to the path in the container

Madison Schott

08/11/2021, 2:44 PM

Any tutorials you recommend on doing this?

Madison Schott

08/11/2021, 2:46 PM

Also what are the benefits of a Docker container vs Github storage option?

Kevin Kho

08/11/2021, 2:54 PM

Oh I see. You might be able to do it like this with Git storage (not Github). Git storage keeps additional files, but only for stuff like

yml

. If you are trying to include other Python dependencies, that should be a Docker container. Git storage lets you keep some of these static files, but you need to use them outside the tasks because the repo is cloned, and then deleted after reading in those stuff. In your case with dbt attached to the shell task, it might not work. The Docker container lets you

pip install

modules inside compared to using Git storage. You can also include stuff like R or C or Java, stuff outside Python needed to run Flows in the container. There are two ways you can do this I think. First is you can add the

dbt

dependencies to the container that you put in ECR. And then you can run the flow on top of that (DockerRun + Github Storage), it might be a bit harder this way to get that paths working this way. If you use Docker storage though, I think the paths are easier to resolve because you specify the

WORKDIR

. You would just add the files to the container like this .

Madison Schott

08/11/2021, 8:31 PM

hmm I'm not sure if local storage is what I want, is there a way to use Docker without using local storage?

Kevin Kho

08/11/2021, 8:33 PM

Oh yeah this example was just because someone specifically wanted that but for you it would be the ECSRun and the Docker storage. I think the WORKDIR would set the current dir and you just need to make sure the dbt yaml file can be seen by that. Or maybe the easiest thing to do would be to provide an absolute path inside the container. All storage and run configurations can be mixed and matched for the most part.

Madison Schott

08/11/2021, 8:34 PM

The workdir would be where I have the dbt files stored on my personal computer?

Madison Schott

08/11/2021, 8:34 PM

Do I need to create a VM within the file at all?

Kevin Kho

08/11/2021, 8:36 PM

No I think the workdir is just to set where those commands in the Docker container will run from and it affects stuff like where the files are copied to. You should not need to create a VM in the file. Did you mean virtual env? You shouldn’t need to for most cases as the Docker container already provides the isolation needed.

Kevin Kho

08/11/2021, 8:37 PM

That example I gave….was like code I received from that person and then I helped them put it together and uploaded which is why there is a virtual env in that container

Madison Schott

08/11/2021, 8:38 PM

Copy code

# specify a base image
FROM python:3-slim

# copy all folder contents to the image
COPY . .

# install all dependencies
RUN pip install -r requirements.txt


ADD ~./.dbt/profiles.yml

Madison Schott

08/11/2021, 8:39 PM

Am I miss something here? Do I still need to copy the folder where all my dbt files are?

Kevin Kho

08/11/2021, 8:42 PM

It will be easiest if you use the Prefect image I think.

prefecthq:prefect

. I think you want two COPY commands. First for the files you have. Second for the

.dbt

folder. I would also COPY it into

.dbt

in the container because that’s where the files will be looked for by

dbt

But yes it’s something like this. My advice is just to try this and then even before going to prefect, you could try building the container from the dockerfile and going inside to check if the

.dbt

folder made it right, and then try running the

dbt

commands before registering the flow.

Madison Schott

08/11/2021, 8:45 PM

How can I use the Prefect image?

Kevin Kho

08/11/2021, 8:47 PM

Copy code

FROM prefecthq:prefect

Madison Schott

08/11/2021, 8:47 PM

That's all I would put in the Dockerfile?

Kevin Kho

08/11/2021, 8:48 PM

But there’s a bunch of different images for python versions like 3.8, 3.7, 3.6 and then maybe a Prefect version number. The images are here . Yep! you would use this as the base image instead of python:3-slim

Madison Schott

08/11/2021, 8:51 PM

Copy code

# specify a base image
FROM prefecthq:prefect

# copy all folder contents to the image
COPY ~./.dbt/profiles.yml
COPY /Users/madison/dbt_snowflake

# install all dependencies
RUN pip install -r requirements.txt

Madison Schott

08/11/2021, 8:51 PM

So something like that?

Kevin Kho

08/11/2021, 8:52 PM

Yeah I think you need to specify the destination in the COPY? not 100% sure but yeah that looks food

Madison Schott

08/11/2021, 8:57 PM

Copy code

Step 1/12 : FROM prefecthq:prefect
pull access denied for prefecthq, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

Kevin Kho

08/11/2021, 9:00 PM

That is weird. You shouldn’t need to authenticate since it’s a public image. Could you try something like this and see if you can pull it?

FROM prefecthq/prefect:0.14.14-python3.7

Kevin Kho

08/11/2021, 9:03 PM

Oof! My bad. Do

FROM prefecthq/prefect:latest

. This will work

Madison Schott

08/11/2021, 9:04 PM

thanks! now I am getting this error

COPY failed: file not found in build context or excluded by .dockerignore: stat ~.dbt/profiles.yml: file does not exist

Madison Schott

08/11/2021, 9:07 PM

typo in that message but path is correct, am I missing a step before that?

Kevin Kho

08/11/2021, 9:08 PM

Let me read up on this. I haven’t copied files by absolute path before

Kevin Kho

08/11/2021, 9:11 PM

I guess this won’t work. I think you need to 1. move those files into the repo 2. they will be copied with the first command 3. run some shell command to move them to the home directory

RUN ["mv", "…"]

does that make sense?

Madison Schott

08/11/2021, 9:20 PM

how do I know the current directory of the container? I don't believe I ever specified one

Kevin Kho

08/11/2021, 9:23 PM

the default is the root so you can change it with the WORKDIR command.

Kevin Kho

08/11/2021, 9:24 PM

if the directory specified does not exist, it will be created.

Madison Schott

08/11/2021, 9:27 PM

I don't understand why the copy of the one file won't work then

Madison Schott

08/11/2021, 9:27 PM

it's saying the file doesn't exist for every file I try to copy

Kevin Kho

08/11/2021, 9:29 PM

My understanding is that the COPY command is limited to the files seen in the directory with the Dockerfile. Are you trying to copy stuff outside it? Or are you talking about moving that file once it’s in the container to the home directory?

Madison Schott

08/11/2021, 9:30 PM

I just want to move the file from my local computer to the container

Kevin Kho

08/11/2021, 9:32 PM

Yeah you can’t do that with the Dockerfile. See the Github issue above. You need to have it in the same directory as the Dockerfile as far as I can tell.

Madison Schott

08/11/2021, 9:33 PM

that issue isn't with dbt I believe

Madison Schott

08/11/2021, 9:33 PM

I don't need to use absolute path for that, just anything that works lol

Kevin Kho

08/11/2021, 9:36 PM

Oh that issue points for general COPY-ing of files in the Docker container. The

.dbt

files are in the home directory right? Not in the current directory?

Madison Schott

08/11/2021, 9:51 PM

the .dbt ones are yes but not the sql files for the model

Madison Schott

08/11/2021, 9:51 PM

just the profiles.yml

Kevin Kho

08/11/2021, 9:53 PM

You probably need to move them into the directory with the Dockerfile to get them inside. The other option is you can download them from Git if you have another repo to get them into the container. The SQL files should be working since they live in the repo right?

Madison Schott

08/12/2021, 2:44 PM

They live in the repo on github yes

Madison Schott

08/12/2021, 2:44 PM

I still don't understand how I would move them into that directory when it's telling me that path doesn't exist

Kevin Kho

08/12/2021, 2:46 PM

Wait sorry, I think I misunderstood, the files are already in the folder where the Dockerfile lives and they are not being copied over?

Madison Schott

08/12/2021, 2:58 PM

so I moved the Dockerfile to where my dbt models are and they copied but the profiles.yml file does not live in the same location as the dbt models

Kevin Kho

08/12/2021, 3:00 PM

yeah so those need to be in the same location for Docker’s build context to see them.

Kevin Kho

08/12/2021, 3:20 PM

Docker just creates a context and won’t see things outside that folder with the Dockerfile so you can’t use an absolute path to get something inside the container. They are all relative to the folder. Check this for an explanation.

Madison Schott

08/12/2021, 5:29 PM

I think this

profiles_dir='/Users/madisonschott/.dbt'

which is within the dbt task is also what's causing the issue

Madison Schott

08/12/2021, 5:29 PM

Copy code

dbt_task = DbtShellTask(profile_name='winc',
                        log_stderr=True,
                        environment='dev',
                        dbt_kwargs = {
                        },
                        profiles_dir='/Users/madisonschott/.dbt')

Madison Schott

08/12/2021, 5:30 PM

Do I need to specify the directory if it's within my container?

Madison Schott

08/12/2021, 9:09 PM

hmm now I'm reading that I don't even need a profiles.yml if I specify all the vars in dbt_kwargs

Madison Schott

08/12/2021, 9:10 PM

so the issue should be an easy one

Kevin Kho

08/12/2021, 9:13 PM

Oh sorry I read this then forgot to respond. Yeah you would specify the location in the container or after you clone it, you could move it to the appropriate directory inside the container such that your Flow can be found.

Madison Schott

08/12/2021, 9:28 PM

well I don't need it in the docker container at all is what I'm saying- since I have the arguments within the dbt task, right?

Kevin Kho

08/12/2021, 10:05 PM

Am not 100% sure as I don’t use dbt a lot myself 😅 but if it works then I guess so yep!

Madison Schott

08/12/2021, 10:06 PM

gahhh it doesn't work haha I am at a loss for how to get this working

Madison Schott

08/12/2021, 10:17 PM

does s3 storage replace the use of Docker?

Kevin Kho

08/12/2021, 10:20 PM

Saw your thread. I’ll bug someone in the community with more experience than me. If they don’t respond, I’ll look into it more myself and maybe I’ll have something for you tom. S3 storage only keeps the flows, not the dependent files.

Madison Schott

08/12/2021, 10:20 PM

Ok thanks, I appreciate the help

6 Views

Open in Slack

Previous Next