Hey everyone, I'm currently learning my way around...
# ask-community
k
Hey everyone, I'm currently learning my way around Prefect (and a lot of the other tools involved with Prefect) and tried to set up a simple hello task using GitHub storage. The task failed and resulted in this error in the log, could someone help me decipher it?
Copy code
Failed to load and execute Flow's environment: GithubException(404, {'data': 'Not Found'}, {'server': '<http://GitHub.com|GitHub.com>', 'date': 'Thu, 26 Aug 2021 01:20:16 GMT', 'content-type': 'text/plain; charset=utf-8', 'transfer-encoding': 'chunked', 'vary': 'X-PJAX, X-PJAX-Container, Accept-Encoding, Accept, X-Requested-With', 'permissions-policy': 'interest-cohort=()', 'cache-control': 'no-cache', 'set-cookie': 'logged_in=no; domain=.<http://github.com|github.com>; path=/; expires=Fri, 26 Aug 2022 01:20:16 GMT; secure; HttpOnly; SameSite=Lax', 'strict-transport-security': 'max-age=31536000; includeSubdomains; preload', 'x-frame-options': 'deny', 'x-content-type-options': 'nosniff', 'x-xss-protection': '0', 'referrer-policy': 'origin-when-cross-origin, strict-origin-when-cross-origin', 'expect-ct': 'max-age=2592000, report-uri="<https://api.github.com/_private/browser/errors>"', 'content-security-policy': "default-src 'none'; base-uri 'self'; connect-src 'self'; form-action 'self'; img-src 'self' data:; script-src 'self'; style-src 'unsafe-inline'", 'content-encoding': 'gzip', 'x-github-request-id': 'EA90:30AB:16BB88:2112BC:6126EC50'})
This that message appeared right after this one in the log:
Copy code
Downloading flow from GitHub storage - repo: '<REPO>', path: '<PATH>'
k
Hey @Ken Nguyen, do you have your own hosted Github? Could you show me how you configured Github Storage?
k
This is my org's private repo
Copy code
STORAGE = GitHub(repo="<REPO>",
                 path="<PATH>",
                 access_token_secret="GITHUB_ACCESS_TOKEN", , 
                 base_url="<https://github.com/><REPO>.git"
)

...

with Flow("hello-flow", storage=STORAGE) as flow:
    say_hello()
And thanks for the quick response Kevin!
k
Sure no problem.
base_url
is for if you have you own hosted version of Github hosted somewhere else (not github.com). The default value is https://api.github.com, which is the normal public one. You would have a different API endpoint if your company hosted their own Github. So I think you can just try removing
base_url
and using the default.
k
Will try that. Btw, is there a way for us to see the code that is actually running when we trigger a run from the UI? I sometimes just want to look at the code to be sure any changes went through for a sanity check.
k
If you store it as a script , which doesnt serialize, you can just see the script. For Github, it’s script-based by default so it is using whatever is in the repo.
k
Awesome, your suggestion worked!! Thank you
Next up, I've set up an ECS Agent and would like to try running with the agent. I saw on the doc that this requires a Docker image. Are there other ways of setting up the environment (i.e. a requirements.txt in the same repo folder as the flow), or is Docker image a requirement?
k
Docker image is a requirement for ECS because that service is image based yep.
👀 1
k
I've just tried to run a task with ECS and the task is stuck in a Scheduled state. Based on this, I'm assuming I've configured my agent wrong. Do you any any suggestion on how to start debugging when the logs aren't coming through? Here's my flow code below:
Copy code
RUN_CONFIG = ECSRun(labels=['fargate-dev'],
                    task_role_arn="arn:aws:iam::123456789:role/<ROLE>",
                    image="org/repo/flows-folder/flow-project-folder")

@task
...

with Flow("hello-flow", run_config=RUN_CONFIG, storage=STORAGE) as flow:
    say_hello()
For more context, my GitHub folder structure is below:
Copy code
org/repo
|_flows-folder
    |_flow-project-folder
        |_flow.py
        |_requirements.txt
        |_Dockerfile
(Apologies in advance for any naive questions I have ahead, I'm still a bit new to these tools)
k
No worries. If it is stuck in scheduled, I would say it’s a label issues 98% of the time. What is the label of your agent? It should have the
fargate-dev
as well.
k
yep it does!
k
This looks off though? Is there any healthy ECS one that can pick up the Flow?
Was this setup as an ECS service or was it spun up from a laptop?
k
This was set up as an ECS service. Can agents no longer be used once they are no longer healthy?
k
The agent should be a long running process that is always on. It’ll ping Prefect Cloud every 10 seconds so it looks like this agent may have stopped pinging? If they are not on, there won’t be anything to receive the flows. I think the test here would be to spin that agent on local with
prefect agent ecs start ……some_configs --label fargate-dev
, and see if this picks up your flow run. If it does, I think the service might not be scaling properly?
k
I did so and it did pick up the flow. But the flow is now stuck in Submitted state. Gonna try to find out what's wrong. I want to add the
--show-flow-logs
config to the agent. Is there a way for me to 'edit' the agent once it's running (i.e. adding tags/changing configs)?
k
You have to restart the agents. When ECS specifically is stuck in submitted, it basically means there was issues even grabbing the container and starting. It’s so hard to diagnose because there are no logs. If there are, tell me where to find them cuz I haven’t seen them from experience 😆. So I’ve seen it’s either a lack of IAM roles or permissions, mismatched image name, or straight up a broken image that can’t start.
🥲 1
k
No logs is my nightmare 😭
k
An overwhelming amount of time, this is permission related.
k
Noted, will keep you updated!
k
Is your container in ECR btw?
k
I don't believe so. I'm following this guide to set up and don't recall setting up an ECR container. Once again, I'm new to AWS as well so please excuse the naivety.
k
Try this more recent one one? I guess if you are using the regular Prefect image, it should just be a pull from DockerHub so you should be fine.
k
Will take a look, thanks for the suggestion!
Silly question but, if I'm using a GitHub storage, is it correct to put the directory to the Dockerfile for the
image
argument in ECSRun?
k
No it’s supposed to point to the python file with the Flow independent of ECSRun
k
I'm having a bit of difficulty putting the pieces together. If I want to use ECS agent + Github storage, what are my options for storing docker images? It seems like using GitHub storage means that only the flow file is pulled every time that flow is run. Is my only option to host public images on Dockerhub?
k
So people with this combination normally use AWS ECR to host their images to stay in the AWS ecosystem. Your understanding is right.
👍 1
k
Is there any particular reason why GitHub storage doesn't pull the entire repo?
Semi-related question, how is version control typically done with AWS ECR?
k
So
Git
storage pulls the whole repo, but that’s intended for
.sql
or
.yaml
files. Not for other python files. The reason is that if you have other Python files, you likely should be making it a Python package and putting that in the Docker. Installing a Python module does stuff with the Python path, so even if Prefect grabbed the repo, it needs to be explicitly installed to be usable. This requires making assumptions about the repo structure that basically felt like re-inventing the wheel of package management.
Version control would be the same in ECR. You tag your images I believe and you get a new hash if you update it.
k
That makes sense. So if I used ECR for storage, and I needed to update my flow, would the only thing I need to do be re-run the updated flow file?
k
I think you’re asking if you need to rebuild the image and upload it each time the flow changes. The answer to this is yes to have to.
k
Got it. Really appreciate your help btw!!!
k
If you want to decouple image building from editing the flow, you can upload the image separately to ECR, then use Github Storage + ECSRun like you did previously. Of course this only works well if the dependencies are pretty fixed.
Of course! 👍
k
How would I feed in the image argument in
ECSRun
if my flow is in GitHub storage and my image is in ECR?
k
Check the
image
argument here . It’ll look like that. Then you need the agent to have permissions to load that (through aws credentials)
🙌 1
k
For using the ECS Agent + GitHub Storage combo, how would you recommend storing other .py files containing helper functions? Or is it best to have all your functions in the flow file?
k
There isn’t quite an in-between. It’s either all in the Docker image installed as a package or all in the Flow file yep.
k
I'm currently following this guide to get an ECS Agent running. I'm at the
aws ecs register-task-definition
step, but keep running into this error, could you help me decipher what's wrong?
Copy code
An error occurred (IncompleteSignatureException) when calling the RegisterTaskDefinition operation: 'key' not a valid key=value pair (missing equal-sign) in Authorization header: 'AWS4-HMAC-SHA256 Credential=<access key id>/20210829/us-east-1/ecs/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-target, Signature=
For context, here is the prefect-agent-td.json
Copy code
{
  "family": "prefect-agent",
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "networkMode": "awsvpc",
  "cpu": "512",
  "memory": "1024",
  "taskRoleArn": "arn:aws:iam::123123123:role/ECSTaskS3ECRRole",
  "executionRoleArn": "arn:aws:iam::123123123:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "prefect-agent",
      "image": "prefecthq/prefect:latest-python3.8",
      "essential": true,
      "command": [
        "prefect",
        "agent",
        "ecs",
        "start"
      ],
      "environment": [
        {
          "name": "PREFECT__CLOUD__API_KEY",
          "value": "<REDACTED>"
        },
        {
          "name": "PREFECT__CLOUD__AGENT__LABELS",
          "value": "['fargate','dev']"
        },
        {
          "name": "PREFECT__CLOUD__AGENT__LEVEL",
          "value": "INFO"
        },
        {
          "name": "PREFECT__CLOUD__API",
          "value": "<https://api.prefect.io>"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/prefect-agent",
          "awslogs-region": "us-west-1",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "true"
        }
      }
    }
  ]
}
k
I am not sure but try providing the key with
["prefect","agent","ecs","start", "--key", "INSERT_TOKEN_HERE"]
k
And which token should that be?
k
Oh. Replace token with key. Same as the Prefect API key
k
I still got the same error unfortunately
k
Can you show me what you typed in the CLI? Redact important stuff.
k
Copy code
aws ecs register-task-definition --cli-input-json file:///<path>/prefect/prefect-agent-td.json
k
That looks reasonable assuming the path works. It must be something inside the task definition. Could you show me that?
k
Copy code
{
  "family": "prefect-agent",
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "networkMode": "awsvpc",
  "cpu": "512",
  "memory": "1024",
  "taskRoleArn": "arn:aws:iam::123123123:role/ECSTaskS3ECRRole",
  "executionRoleArn": "arn:aws:iam::123123123:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "prefect-agent",
      "image": "prefecthq/prefect:latest-python3.8",
      "essential": true,
      "command": [
        "prefect",
        "agent",
        "ecs",
        "start",
        "--key",
        "<KEY>"
      ],
      "environment": [
        {
          "name": "PREFECT__CLOUD__API_KEY",
          "value": "<KEY>"
        },
        {
          "name": "PREFECT__CLOUD__AGENT__LABELS",
          "value": "['fargate','dev']"
        },
        {
          "name": "PREFECT__CLOUD__AGENT__LEVEL",
          "value": "INFO"
        },
        {
          "name": "PREFECT__CLOUD__API",
          "value": "<https://api.prefect.io>"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/prefect-agent",
          "awslogs-region": "us-west-1",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "true"
        }
      }
    }
  ]
}
k
oh oof that’s what you posted lol
😆 1
Do you have an extra
/
after
file
in your CLI command?
Oh you might have a trailing space after your AWS_ACCESS_KEY_ID environment variable? I would say try registering some really basic task definition. If you get the same error, check the aws env variables.
k
There was a trailing space!! Nice find, thank you!!
👍 1
Checking on my cluster and saw that the service I created for the ECS Agent is not active. However, I don't see an active agent on the Prefect UI, am I missing a step?
k
It’s not because the Desired count is 1 here and the Running count is 0 so you agent is not starting successfully. I would click into the service and check for logs why you can’t start an agent.
k
It seems like this is the root issue:
Copy code
botocore.exceptions.ClientError: An error occurred (UnauthorizedOperation) when calling the DescribeVpcs operation: You are not authorized to perform this operation.
Would you happen to know which permission policy needs to be added to the IAM user to resolve this?
k
It’s that
DescribeVpcs
listed there, but I don’t know the complete list of permissions because I just test with admin. We might have it somewhere one sec.
Try this set?
👀 1
k
Silly me, I was granting the permissions to the wrong role. Thanks!
I finally got the ECS agent running! Btw, what happens when you have 2 agents running with the same tags? Which one would the flow choose to run?
k
The agents poll every 10 seconds so whichever one gets to it next
k
cool!
k
We dont have any load balancing
k
How would I be able to store helper functions in a Docker image? I initially thought I could, for example, do
COPY helper_functions.py /helper_functions.py
in the Dockerfile. Then in my flow file I could do
from helper_functions import sample_function
, but I got this error instead:
ModuleNotFoundError: No module named
Any suggestions?
k
Funny you ask because I’m writing a tutorial about this now 😆. Check my Dockerfile here . You need to make it a Python package then use
pip install -e .
Wanna review my draft when I finish?
k
Would love to!!
k
Are you a data scientist or data engineer or software engineer?
k
Data Engineer
k
Ah ok. It might be a bit too simplified for you. We’ll see haha
k
Might I add I am fairly new to data engineering, so simplified content is the best content for me right now 😅
👍 1
k
I have a minimal code example now here
k
I'm trying to register a flow that includes an
import pandas
. I have that requirement installed in my Docker container which I included in the ECSRun. Yet I still got this
error: ModuleNotFoundError: No module named 'pandas'
Is there a specific order I'm supposed write my flow script to prevent this?
k
Do you get that when you register or when you run?
k
When I register
k
Do you have
pandas
installed when you register?
k
Oh, I didn't know I need to for registering
Does prefect register the flow by running the flow script?
k
It builds the flow by running it yeah so you need it to register but some people get away with it by using imports inside tasks to defer it
k
Do you have a good example of using cloud secrets?
k
It’s very simple to use, you just create the secret in cloud then use it like this. Are you encountering an issue?
k
Yeah, I stored my secrets in a JSON format. When I tried to use it in a script it's giving me a
TypeError: expected string or bytes-like object
error
I tried to add a str() around the PrefectSecret() but the error still persisted, so I'm not 100% sure if it's the secrets' fault though.
k
You can just try retrieving with Secret(“name”).get()
k
If the secret is a json, would I do Secret(“name”).get()['password']?
k
I don’t think so. I think you need to convert it to a dict before you can do that as JSON and Dicts are not the same
k
Alright, lemme give that a try
When I use Secret(), I get
ValueError: Local Secret "Name" was not found.
How can I get it to use the cloud secret instead of a local secret?
k
Set this env variable on your machine: ``"PREFECT__CLOUD__USE_LOCAL_SECRETS": "true"`
k
A small note, it seems like Secret("name").get() returns a dict already. But besides that, I finally got an agent running and a scheduled flow! Will be seeing how scheduling goes over the next couple days. Thanks for your help!! Next on my list is to set up dbt with prefect, but that'll be a task for a later week
k
Sure! Nice work!