https://prefect.io logo
Title
m

Madison Schott

08/09/2021, 2:55 PM
Hi all, trying to figure out the best way to deploy by Prefect agent to AWS and came across this tutorial- has anyone followed this for their production deployments? What are the benefits of reading the flow code from S3 bucket? Thanks! https://towardsdatascience.com/serverless-data-pipelines-made-easy-with-prefect-and-aws-ecs-fargate-7e25bacb450c
k

Kevin Kho

08/09/2021, 3:05 PM
Hey @Madison Schott, the advantage of pulling it from somewhere is that execution is not confined to one machine. If you use Local storage, that flow can only run on that specific agent where it was registered but using something like S3 storage lets you run the flow anywhere as it will be downloaded. For ECSRun with S3 storage, you don’t have to re-build the container everytime you change the Flow. The Flow in S3 will just run on top of the container specified so it can be easier for development.
m

Madison Schott

08/09/2021, 3:07 PM
Got it! Thanks! And then what is usually easier to maintain ECS Fargate or EKS? I just discovered this tutorial as well https://towardsdatascience.com/distributed-data-pipelines-made-easy-with-aws-eks-and-prefect-106984923b30
k

Kevin Kho

08/09/2021, 3:13 PM
I think this is a “whatever you prefer” scenario. For example, some users with CI/CD pipelines choose Github storage because they have some registration steps after every commit. If you want everything coupled in the Docker container, you can put everything there. And then S3 if you want the separation between the Flow and the dependencies (in the Docker container). Does that make sense?
m

Madison Schott

08/09/2021, 3:18 PM
makes sense, thank you!
So I have an agent running and also have a flow with the same tags but my run is stuck in a submitted state... what's the reasoning behind this?
k

Kevin Kho

08/09/2021, 9:39 PM
This is on ECS right?
m

Madison Schott

08/09/2021, 9:44 PM
yup
k

Kevin Kho

08/09/2021, 9:45 PM
Can I see your RunConfig?
m

Madison Schott

08/09/2021, 9:46 PM
RUN_CONFIG = ECSRun(run_task_kwargs={'cluster': 'prefect-prod'},
                    execution_role_arn='---',
                    labels=['ecs', 'test', 'hello', 'august'])
k

Kevin Kho

08/09/2021, 9:49 PM
I would say run the Flow in debug level logs and maybe something will pop up. It might be lack of resource requirements also. Maybe we’ll see where it is hanging
m

Madison Schott

08/09/2021, 9:53 PM
How do I run the debug level?
k

Kevin Kho

08/09/2021, 10:03 PM
In the UI, hit the Run button and then add an env variable
PREFECT___LOGGING___LEVEL
with value
DEBUG
or you can do it on agent spin up using the
--env
flag.
m

Madison Schott

08/09/2021, 10:41 PM
when I do that there are still no logs coming up, it just says it's scheduled
k

Kevin Kho

08/09/2021, 10:42 PM
What executor are you using? I have seen this before and it was permission related. Do you not see any logs on the Prefect UI?
m

Madison Schott

08/09/2021, 10:43 PM
I see logs but they are just saying that the process is being scheduled and rescheduled
I was running using the UI
k

Kevin Kho

08/09/2021, 11:21 PM
Could you try adding the env variable to the run config like
ECSRun(env={"PREFECT___LOGGING___LEVEL": "DEBUG"}
and then re-register and run?
Or you can use
prefect agent ecs start --env PREFECT___LOGGING___LEVEL=DEBUG
m

Madison Schott

08/09/2021, 11:45 PM
did the first and the same thing is still happening with no logs
k

Kevin Kho

08/09/2021, 11:48 PM
Can I have a flow run id?
m

Madison Schott

08/09/2021, 11:49 PM
3b874891-5938-4e19-af8b-213341e5cc78
k

Kevin Kho

08/09/2021, 11:57 PM
Anything on the ECS side with Cloud Watch logs?
m

Madison Schott

08/09/2021, 11:59 PM
nope nothing
doesn't look like there's any services on the cluster
k

Kevin Kho

08/10/2021, 12:07 AM
How is your agent configured?
Can you try adding `--log-level DEBUG`so that we can see if the agent will give more logs? But could you also show me how you start it?
m

Madison Schott

08/10/2021, 2:37 PM
I just have the agent started like this in a python file
AGENT = ECSAgent(cluster="prefect-prod", labels=['ecs', 'test', 'hello', 'august'])

AGENT.start()
Can I add an env to this for logging?
k

Kevin Kho

08/10/2021, 2:55 PM
Use the
env_vars={}
argument of the ECS Agent
m

Madison Schott

08/10/2021, 2:59 PM
did that and I'm still getting nothing
k

Kevin Kho

08/10/2021, 3:00 PM
Nothing on the agent logs as well? Not the Flow logs in the UI?
m

Madison Schott

08/10/2021, 3:00 PM
The role was the only thing I changed- am I missing something
arn:aws:iam::xxxxxxx:role/xxx
k

Kevin Kho

08/10/2021, 3:02 PM
Not exactly sure but just a couple of thoughts. Maybe you can try adding cpu and memory values to the RunConfig? And then maybe you need to authenticate to get the image from ECR? And then lastly check permissions to confirm you have everything?
m

Madison Schott

08/10/2021, 3:03 PM
it started the image just fine when I registered the flow, I logged in and everything to authenticate
I had a permissions issue before which is why I was testing this but an error came up before and not it's just stuck in submitted
k

Kevin Kho

08/10/2021, 3:05 PM
Will ask the team for ideas
m

Madison Schott

08/10/2021, 3:07 PM
Ok thanks, cause the only thing I changed was the role, unless the error was just blocking it from even getting it to the point it's at now
k

Kevin Kho

08/10/2021, 3:39 PM
Does using another role work? Or this was the same flow you had previously?
j

J. Martins

08/10/2021, 4:04 PM
I have had similar behaviour in the past and if I remember it was always something to do with missing role permissions. I think I found it when looking in stopped tasks in the cluster and was able to see some error message… Hope this helps
👍 1
m

Madison Schott

08/10/2021, 4:19 PM
Awesome! I just found an error message, thank you!
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 1 time(s): AccessDeniedException: User:
Do you know what permissions this would be?
k

Kevin Kho

08/10/2021, 4:22 PM
Ah finally 😅. Thank you @J. Martins! This is about pulling the container. So to authenticate you normally do something like this:
aws ecr get-login-password --region <REGION> | docker login -- username AWS --password-stdin <ACCOUNT>.dkr.ecr.<REGION>.<http://amazonaws.com|amazonaws.com>
Is your ecs agent running as a service on AWS too?
m

Madison Schott

08/10/2021, 4:23 PM
found this thread : https://stackoverflow.com/questions/61265108/aws-ecs-fargate-resourceinitializationerror-unable-to-pull-secrets-or-registry but not sure if anyone here has had this same issue when using fargate with prefect
k

Kevin Kho

08/10/2021, 4:24 PM
Oh wow that looks way different than what I suggested. You’re probably right and no I haven’t seen it before here.
m

Madison Schott

08/10/2021, 4:24 PM
I did do that before running
m

Mariia Kerimova

08/10/2021, 4:28 PM
Hmm, interesting. For interacting with CloudWatch and ECR you need to provide execution role which looks like you did, can you double check that that role has right permissions? Or can you attach this policy
AmazonECSTaskExecutionRolePolicy
?
m

Madison Schott

08/11/2021, 2:40 PM
Got it to work- there were some more permissions we had to add- now I am getting ab error with my dbt task saying it can't fidn the dbt.yml, do I need to download this on my Docker container? Not sure how that works
k

Kevin Kho

08/11/2021, 2:42 PM
Yeah for that you would need all of the dbt dependencies inside the Docker container and then you would point to the path in the container
m

Madison Schott

08/11/2021, 2:44 PM
Any tutorials you recommend on doing this?
Also what are the benefits of a Docker container vs Github storage option?
k

Kevin Kho

08/11/2021, 2:54 PM
Oh I see. You might be able to do it like this with Git storage (not Github). Git storage keeps additional files, but only for stuff like
yml
. If you are trying to include other Python dependencies, that should be a Docker container. Git storage lets you keep some of these static files, but you need to use them outside the tasks because the repo is cloned, and then deleted after reading in those stuff. In your case with dbt attached to the shell task, it might not work. The Docker container lets you
pip install
modules inside compared to using Git storage. You can also include stuff like R or C or Java, stuff outside Python needed to run Flows in the container. There are two ways you can do this I think. First is you can add the
dbt
dependencies to the container that you put in ECR. And then you can run the flow on top of that (DockerRun + Github Storage), it might be a bit harder this way to get that paths working this way. If you use Docker storage though, I think the paths are easier to resolve because you specify the
WORKDIR
. You would just add the files to the container like this .
m

Madison Schott

08/11/2021, 8:31 PM
hmm I'm not sure if local storage is what I want, is there a way to use Docker without using local storage?
k

Kevin Kho

08/11/2021, 8:33 PM
Oh yeah this example was just because someone specifically wanted that but for you it would be the ECSRun and the Docker storage. I think the WORKDIR would set the current dir and you just need to make sure the dbt yaml file can be seen by that. Or maybe the easiest thing to do would be to provide an absolute path inside the container. All storage and run configurations can be mixed and matched for the most part.
m

Madison Schott

08/11/2021, 8:34 PM
The workdir would be where I have the dbt files stored on my personal computer?
Do I need to create a VM within the file at all?
k

Kevin Kho

08/11/2021, 8:36 PM
No I think the workdir is just to set where those commands in the Docker container will run from and it affects stuff like where the files are copied to. You should not need to create a VM in the file. Did you mean virtual env? You shouldn’t need to for most cases as the Docker container already provides the isolation needed.
That example I gave….was like code I received from that person and then I helped them put it together and uploaded which is why there is a virtual env in that container
m

Madison Schott

08/11/2021, 8:38 PM
# specify a base image
FROM python:3-slim

# copy all folder contents to the image
COPY . .

# install all dependencies
RUN pip install -r requirements.txt


ADD ~./.dbt/profiles.yml
Am I miss something here? Do I still need to copy the folder where all my dbt files are?
k

Kevin Kho

08/11/2021, 8:42 PM
It will be easiest if you use the Prefect image I think.
prefecthq:prefect
. I think you want two COPY commands. First for the files you have. Second for the
.dbt
folder. I would also COPY it into
.dbt
in the container because that’s where the files will be looked for by
dbt
But yes it’s something like this. My advice is just to try this and then even before going to prefect, you could try building the container from the dockerfile and going inside to check if the
.dbt
folder made it right, and then try running the
dbt
commands before registering the flow.
m

Madison Schott

08/11/2021, 8:45 PM
How can I use the Prefect image?
k

Kevin Kho

08/11/2021, 8:47 PM
FROM prefecthq:prefect
m

Madison Schott

08/11/2021, 8:47 PM
That's all I would put in the Dockerfile?
k

Kevin Kho

08/11/2021, 8:48 PM
But there’s a bunch of different images for python versions like 3.8, 3.7, 3.6 and then maybe a Prefect version number. The images are here . Yep! you would use this as the base image instead of python:3-slim
m

Madison Schott

08/11/2021, 8:51 PM
# specify a base image
FROM prefecthq:prefect

# copy all folder contents to the image
COPY ~./.dbt/profiles.yml
COPY /Users/madison/dbt_snowflake

# install all dependencies
RUN pip install -r requirements.txt
So something like that?
k

Kevin Kho

08/11/2021, 8:52 PM
Yeah I think you need to specify the destination in the COPY? not 100% sure but yeah that looks food
m

Madison Schott

08/11/2021, 8:57 PM
Step 1/12 : FROM prefecthq:prefect
pull access denied for prefecthq, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
k

Kevin Kho

08/11/2021, 9:00 PM
That is weird. You shouldn’t need to authenticate since it’s a public image. Could you try something like this and see if you can pull it?
FROM prefecthq/prefect:0.14.14-python3.7
Oof! My bad. Do
FROM prefecthq/prefect:latest
. This will work
m

Madison Schott

08/11/2021, 9:04 PM
thanks! now I am getting this error
COPY failed: file not found in build context or excluded by .dockerignore: stat ~.dbt/profiles.yml: file does not exist
typo in that message but path is correct, am I missing a step before that?
k

Kevin Kho

08/11/2021, 9:08 PM
Let me read up on this. I haven’t copied files by absolute path before
I guess this won’t work. I think you need to 1. move those files into the repo 2. they will be copied with the first command 3. run some shell command to move them to the home directory
RUN ["mv", "…"]
does that make sense?
m

Madison Schott

08/11/2021, 9:20 PM
how do I know the current directory of the container? I don't believe I ever specified one
k

Kevin Kho

08/11/2021, 9:23 PM
the default is the root so you can change it with the WORKDIR command.
if the directory specified does not exist, it will be created.
m

Madison Schott

08/11/2021, 9:27 PM
I don't understand why the copy of the one file won't work then
it's saying the file doesn't exist for every file I try to copy
k

Kevin Kho

08/11/2021, 9:29 PM
My understanding is that the COPY command is limited to the files seen in the directory with the Dockerfile. Are you trying to copy stuff outside it? Or are you talking about moving that file once it’s in the container to the home directory?
m

Madison Schott

08/11/2021, 9:30 PM
I just want to move the file from my local computer to the container
k

Kevin Kho

08/11/2021, 9:32 PM
Yeah you can’t do that with the Dockerfile. See the Github issue above. You need to have it in the same directory as the Dockerfile as far as I can tell.
m

Madison Schott

08/11/2021, 9:33 PM
that issue isn't with dbt I believe
I don't need to use absolute path for that, just anything that works lol
k

Kevin Kho

08/11/2021, 9:36 PM
Oh that issue points for general COPY-ing of files in the Docker container. The
.dbt
files are in the home directory right? Not in the current directory?
m

Madison Schott

08/11/2021, 9:51 PM
the .dbt ones are yes but not the sql files for the model
just the profiles.yml
k

Kevin Kho

08/11/2021, 9:53 PM
You probably need to move them into the directory with the Dockerfile to get them inside. The other option is you can download them from Git if you have another repo to get them into the container. The SQL files should be working since they live in the repo right?
m

Madison Schott

08/12/2021, 2:44 PM
They live in the repo on github yes
I still don't understand how I would move them into that directory when it's telling me that path doesn't exist
k

Kevin Kho

08/12/2021, 2:46 PM
Wait sorry, I think I misunderstood, the files are already in the folder where the Dockerfile lives and they are not being copied over?
m

Madison Schott

08/12/2021, 2:58 PM
so I moved the Dockerfile to where my dbt models are and they copied but the profiles.yml file does not live in the same location as the dbt models
k

Kevin Kho

08/12/2021, 3:00 PM
yeah so those need to be in the same location for Docker’s build context to see them.
Docker just creates a context and won’t see things outside that folder with the Dockerfile so you can’t use an absolute path to get something inside the container. They are all relative to the folder. Check this for an explanation.
m

Madison Schott

08/12/2021, 5:29 PM
I think this
profiles_dir='/Users/madisonschott/.dbt'
which is within the dbt task is also what's causing the issue
dbt_task = DbtShellTask(profile_name='winc',
                        log_stderr=True,
                        environment='dev',
                        dbt_kwargs = {
                        },
                        profiles_dir='/Users/madisonschott/.dbt')
Do I need to specify the directory if it's within my container?
hmm now I'm reading that I don't even need a profiles.yml if I specify all the vars in dbt_kwargs
so the issue should be an easy one
k

Kevin Kho

08/12/2021, 9:13 PM
Oh sorry I read this then forgot to respond. Yeah you would specify the location in the container or after you clone it, you could move it to the appropriate directory inside the container such that your Flow can be found.
m

Madison Schott

08/12/2021, 9:28 PM
well I don't need it in the docker container at all is what I'm saying- since I have the arguments within the dbt task, right?
k

Kevin Kho

08/12/2021, 10:05 PM
Am not 100% sure as I don’t use dbt a lot myself 😅 but if it works then I guess so yep!
m

Madison Schott

08/12/2021, 10:06 PM
gahhh it doesn't work haha I am at a loss for how to get this working
does s3 storage replace the use of Docker?
k

Kevin Kho

08/12/2021, 10:20 PM
Saw your thread. I’ll bug someone in the community with more experience than me. If they don’t respond, I’ll look into it more myself and maybe I’ll have something for you tom. S3 storage only keeps the flows, not the dependent files.
m

Madison Schott

08/12/2021, 10:20 PM
Ok thanks, I appreciate the help