Hi all, trying to figure out the best way to deplo...
# prefect-server
m
Hi all, trying to figure out the best way to deploy by Prefect agent to AWS and came across this tutorial- has anyone followed this for their production deployments? What are the benefits of reading the flow code from S3 bucket? Thanks! https://towardsdatascience.com/serverless-data-pipelines-made-easy-with-prefect-and-aws-ecs-fargate-7e25bacb450c
k
Hey @Madison Schott, the advantage of pulling it from somewhere is that execution is not confined to one machine. If you use Local storage, that flow can only run on that specific agent where it was registered but using something like S3 storage lets you run the flow anywhere as it will be downloaded. For ECSRun with S3 storage, you don’t have to re-build the container everytime you change the Flow. The Flow in S3 will just run on top of the container specified so it can be easier for development.
m
Got it! Thanks! And then what is usually easier to maintain ECS Fargate or EKS? I just discovered this tutorial as well https://towardsdatascience.com/distributed-data-pipelines-made-easy-with-aws-eks-and-prefect-106984923b30
k
I think this is a “whatever you prefer” scenario. For example, some users with CI/CD pipelines choose Github storage because they have some registration steps after every commit. If you want everything coupled in the Docker container, you can put everything there. And then S3 if you want the separation between the Flow and the dependencies (in the Docker container). Does that make sense?
m
makes sense, thank you!
So I have an agent running and also have a flow with the same tags but my run is stuck in a submitted state... what's the reasoning behind this?
k
This is on ECS right?
m
yup
k
Can I see your RunConfig?
m
Copy code
RUN_CONFIG = ECSRun(run_task_kwargs={'cluster': 'prefect-prod'},
                    execution_role_arn='---',
                    labels=['ecs', 'test', 'hello', 'august'])
k
I would say run the Flow in debug level logs and maybe something will pop up. It might be lack of resource requirements also. Maybe we’ll see where it is hanging
m
How do I run the debug level?
k
In the UI, hit the Run button and then add an env variable
PREFECT___LOGGING___LEVEL
with value
DEBUG
or you can do it on agent spin up using the
--env
flag.
m
when I do that there are still no logs coming up, it just says it's scheduled
k
What executor are you using? I have seen this before and it was permission related. Do you not see any logs on the Prefect UI?
m
I see logs but they are just saying that the process is being scheduled and rescheduled
I was running using the UI
k
Could you try adding the env variable to the run config like
ECSRun(env={"PREFECT___LOGGING___LEVEL": "DEBUG"}
and then re-register and run?
Or you can use
prefect agent ecs start --env PREFECT___LOGGING___LEVEL=DEBUG
m
did the first and the same thing is still happening with no logs
k
Can I have a flow run id?
m
3b874891-5938-4e19-af8b-213341e5cc78
k
Anything on the ECS side with Cloud Watch logs?
m
nope nothing
doesn't look like there's any services on the cluster
k
How is your agent configured?
Can you try adding `--log-level DEBUG`so that we can see if the agent will give more logs? But could you also show me how you start it?
m
I just have the agent started like this in a python file
Copy code
AGENT = ECSAgent(cluster="prefect-prod", labels=['ecs', 'test', 'hello', 'august'])

AGENT.start()
Can I add an env to this for logging?
k
Use the
env_vars={}
argument of the ECS Agent
m
did that and I'm still getting nothing
k
Nothing on the agent logs as well? Not the Flow logs in the UI?
m
The role was the only thing I changed- am I missing something
arn:aws:iam::xxxxxxx:role/xxx
k
Not exactly sure but just a couple of thoughts. Maybe you can try adding cpu and memory values to the RunConfig? And then maybe you need to authenticate to get the image from ECR? And then lastly check permissions to confirm you have everything?
m
it started the image just fine when I registered the flow, I logged in and everything to authenticate
I had a permissions issue before which is why I was testing this but an error came up before and not it's just stuck in submitted
k
Will ask the team for ideas
m
Ok thanks, cause the only thing I changed was the role, unless the error was just blocking it from even getting it to the point it's at now
k
Does using another role work? Or this was the same flow you had previously?
j
I have had similar behaviour in the past and if I remember it was always something to do with missing role permissions. I think I found it when looking in stopped tasks in the cluster and was able to see some error message… Hope this helps
👍 1
m
Awesome! I just found an error message, thank you!
Copy code
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 1 time(s): AccessDeniedException: User:
Do you know what permissions this would be?
k
Ah finally 😅. Thank you @J. Martins! This is about pulling the container. So to authenticate you normally do something like this:
Copy code
aws ecr get-login-password --region <REGION> | docker login -- username AWS --password-stdin <ACCOUNT>.dkr.ecr.<REGION>.<http://amazonaws.com|amazonaws.com>
Is your ecs agent running as a service on AWS too?
m
found this thread : https://stackoverflow.com/questions/61265108/aws-ecs-fargate-resourceinitializationerror-unable-to-pull-secrets-or-registry but not sure if anyone here has had this same issue when using fargate with prefect
k
Oh wow that looks way different than what I suggested. You’re probably right and no I haven’t seen it before here.
m
I did do that before running
m
Hmm, interesting. For interacting with CloudWatch and ECR you need to provide execution role which looks like you did, can you double check that that role has right permissions? Or can you attach this policy
AmazonECSTaskExecutionRolePolicy
?
m
Got it to work- there were some more permissions we had to add- now I am getting ab error with my dbt task saying it can't fidn the dbt.yml, do I need to download this on my Docker container? Not sure how that works
k
Yeah for that you would need all of the dbt dependencies inside the Docker container and then you would point to the path in the container
m
Any tutorials you recommend on doing this?
Also what are the benefits of a Docker container vs Github storage option?
k
Oh I see. You might be able to do it like this with Git storage (not Github). Git storage keeps additional files, but only for stuff like
yml
. If you are trying to include other Python dependencies, that should be a Docker container. Git storage lets you keep some of these static files, but you need to use them outside the tasks because the repo is cloned, and then deleted after reading in those stuff. In your case with dbt attached to the shell task, it might not work. The Docker container lets you
pip install
modules inside compared to using Git storage. You can also include stuff like R or C or Java, stuff outside Python needed to run Flows in the container. There are two ways you can do this I think. First is you can add the
dbt
dependencies to the container that you put in ECR. And then you can run the flow on top of that (DockerRun + Github Storage), it might be a bit harder this way to get that paths working this way. If you use Docker storage though, I think the paths are easier to resolve because you specify the
WORKDIR
. You would just add the files to the container like this .
m
hmm I'm not sure if local storage is what I want, is there a way to use Docker without using local storage?
k
Oh yeah this example was just because someone specifically wanted that but for you it would be the ECSRun and the Docker storage. I think the WORKDIR would set the current dir and you just need to make sure the dbt yaml file can be seen by that. Or maybe the easiest thing to do would be to provide an absolute path inside the container. All storage and run configurations can be mixed and matched for the most part.
m
The workdir would be where I have the dbt files stored on my personal computer?
Do I need to create a VM within the file at all?
k
No I think the workdir is just to set where those commands in the Docker container will run from and it affects stuff like where the files are copied to. You should not need to create a VM in the file. Did you mean virtual env? You shouldn’t need to for most cases as the Docker container already provides the isolation needed.
That example I gave….was like code I received from that person and then I helped them put it together and uploaded which is why there is a virtual env in that container
m
Copy code
# specify a base image
FROM python:3-slim

# copy all folder contents to the image
COPY . .

# install all dependencies
RUN pip install -r requirements.txt


ADD ~./.dbt/profiles.yml
Am I miss something here? Do I still need to copy the folder where all my dbt files are?
k
It will be easiest if you use the Prefect image I think.
prefecthq:prefect
. I think you want two COPY commands. First for the files you have. Second for the
.dbt
folder. I would also COPY it into
.dbt
in the container because that’s where the files will be looked for by
dbt
But yes it’s something like this. My advice is just to try this and then even before going to prefect, you could try building the container from the dockerfile and going inside to check if the
.dbt
folder made it right, and then try running the
dbt
commands before registering the flow.
m
How can I use the Prefect image?
k
Copy code
FROM prefecthq:prefect
m
That's all I would put in the Dockerfile?
k
But there’s a bunch of different images for python versions like 3.8, 3.7, 3.6 and then maybe a Prefect version number. The images are here . Yep! you would use this as the base image instead of python:3-slim
m
Copy code
# specify a base image
FROM prefecthq:prefect

# copy all folder contents to the image
COPY ~./.dbt/profiles.yml
COPY /Users/madison/dbt_snowflake

# install all dependencies
RUN pip install -r requirements.txt
So something like that?
k
Yeah I think you need to specify the destination in the COPY? not 100% sure but yeah that looks food
m
Copy code
Step 1/12 : FROM prefecthq:prefect
pull access denied for prefecthq, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
k
That is weird. You shouldn’t need to authenticate since it’s a public image. Could you try something like this and see if you can pull it?
FROM prefecthq/prefect:0.14.14-python3.7
Oof! My bad. Do
FROM prefecthq/prefect:latest
. This will work
m
thanks! now I am getting this error
COPY failed: file not found in build context or excluded by .dockerignore: stat ~.dbt/profiles.yml: file does not exist
typo in that message but path is correct, am I missing a step before that?
k
Let me read up on this. I haven’t copied files by absolute path before
I guess this won’t work. I think you need to 1. move those files into the repo 2. they will be copied with the first command 3. run some shell command to move them to the home directory
RUN ["mv", "…"]
does that make sense?
m
how do I know the current directory of the container? I don't believe I ever specified one
k
the default is the root so you can change it with the WORKDIR command.
if the directory specified does not exist, it will be created.
m
I don't understand why the copy of the one file won't work then
it's saying the file doesn't exist for every file I try to copy
k
My understanding is that the COPY command is limited to the files seen in the directory with the Dockerfile. Are you trying to copy stuff outside it? Or are you talking about moving that file once it’s in the container to the home directory?
m
I just want to move the file from my local computer to the container
k
Yeah you can’t do that with the Dockerfile. See the Github issue above. You need to have it in the same directory as the Dockerfile as far as I can tell.
m
that issue isn't with dbt I believe
I don't need to use absolute path for that, just anything that works lol
k
Oh that issue points for general COPY-ing of files in the Docker container. The
.dbt
files are in the home directory right? Not in the current directory?
m
the .dbt ones are yes but not the sql files for the model
just the profiles.yml
k
You probably need to move them into the directory with the Dockerfile to get them inside. The other option is you can download them from Git if you have another repo to get them into the container. The SQL files should be working since they live in the repo right?
m
They live in the repo on github yes
I still don't understand how I would move them into that directory when it's telling me that path doesn't exist
k
Wait sorry, I think I misunderstood, the files are already in the folder where the Dockerfile lives and they are not being copied over?
m
so I moved the Dockerfile to where my dbt models are and they copied but the profiles.yml file does not live in the same location as the dbt models
k
yeah so those need to be in the same location for Docker’s build context to see them.
Docker just creates a context and won’t see things outside that folder with the Dockerfile so you can’t use an absolute path to get something inside the container. They are all relative to the folder. Check this for an explanation.
m
I think this
profiles_dir='/Users/madisonschott/.dbt'
which is within the dbt task is also what's causing the issue
Copy code
dbt_task = DbtShellTask(profile_name='winc',
                        log_stderr=True,
                        environment='dev',
                        dbt_kwargs = {
                        },
                        profiles_dir='/Users/madisonschott/.dbt')
Do I need to specify the directory if it's within my container?
hmm now I'm reading that I don't even need a profiles.yml if I specify all the vars in dbt_kwargs
so the issue should be an easy one
k
Oh sorry I read this then forgot to respond. Yeah you would specify the location in the container or after you clone it, you could move it to the appropriate directory inside the container such that your Flow can be found.
m
well I don't need it in the docker container at all is what I'm saying- since I have the arguments within the dbt task, right?
k
Am not 100% sure as I don’t use dbt a lot myself 😅 but if it works then I guess so yep!
m
gahhh it doesn't work haha I am at a loss for how to get this working
does s3 storage replace the use of Docker?
k
Saw your thread. I’ll bug someone in the community with more experience than me. If they don’t respond, I’ll look into it more myself and maybe I’ll have something for you tom. S3 storage only keeps the flows, not the dependent files.
m
Ok thanks, I appreciate the help