I am trying to use GIT storage with ECS run. The g...
# prefect-community
n
I am trying to use GIT storage with ECS run. The goal is to have a docker container which has all the python packages installed and then just pull the code from github prior to running the flow. This obviously requires that the container should be able to pull code from Github. Then, in
task-definition.yaml
I am providing
GITHUB_ACCESS_TOKEN
through linux ENVIRONMENT variables. But I am getting this error:
Copy code
'The secret GITHUB_ACCESS_TOKEN was not found.  Please ensure that it was set correctly in your tenant: <https://docs.prefect.io/orchestration/concepts/secrets.html>'`
Am I missing something?
k
I think you are providing
GITHUB_ACESS_TOKEN
but for prefect to treat it as a secret, it has to be
Copy code
PREFECT__CONTEXT__SECRETS__GITHUB_ACCESS_TOKEN
? Are you using Local secrets?
n
I have this setup:
Copy code
flow.storage = Git(
        repo='coveredinc/taskrunner',
        flow_path=common.get_flow_path(local_file_path),
        repo_host='<http://github.com|github.com>',
        git_token_secret_name='GITHUB_ACCESS_TOKEN',
        branch_name='git-storage',
        add_default_labels=False
    )

flow.run = ECSRun(
    run_task_kwargs={'cluster': f'{env}-prefect-cluster'},
    task_role_arn=f'arn:aws:iam::<>:role/{env}-task-runner-ecs-task-role',
    execution_role_arn=f'arn:aws:iam::<>:role/{env}-task-runner-ecs-task-execution-role',
    image=f'<>.dkr.ecr.{region}.<http://amazonaws.com/{env}-taskrunner-ecr:latest|amazonaws.com/{env}-taskrunner-ecr:latest>',
    task_definition_path=f's3://{env}-prefect-ecs-config/task-definition.yaml',
    labels=['ecs', f'{env}']
)
k
Yes but Local Secrets or Secrets hosted with Prefect Cloud?
n
Sorry, that edit was just to cleanup the main thread. I haven’t setup any secrets in Prefect Cloud so I am guessing it’s a local secret? I was trying out the suggestion you had, it seems git pull worked out fine. Now my next challenge will be to get the code to work (especially fixing “ModuleNotFound” errors).
k
ModuleNotFound is likely you don’t have packages in the image right?
n
@Kevin Kho Actually I am not getting ModuleNotFound error. But any updates to the code are not getting reflected in flow run. I think that’s because the code is being “packaged” into a python package. Even if I install it in editable mode, code updates pulled via git are not applied to the package. Any thoughts on getting around this? Is there a way to run some kind of ‘setup’ code after
git pull
but before the actual flow code is run?
k
I think you saw my article about packaging a module into Docker right? Taht’s what you tried?
n
yes
k
Can you share the Dockerfile?
Oh other Git files are not pulled, Just the Flow is pulled from Git if that is what you were expecting
n
hmm… so if my
flow.py
depends on another file
helper.py
that would not be pulled? So basically the entire code for the flow has to be contained in one file and no imports or dependencies on anything else? What’s the point of Git/Github storage then?
However in my case, even the changes to the Flow file are not getting executed. Here’s my
Dockerfile
Copy code
FROM python:3.9
WORKDIR /opt/prefect
COPY requirements.txt setup.py pyproject.toml /opt/prefect/
COPY taskrunner /opt/prefect/taskrunner

ARG GITHUB_ACCESS_TOKEN
RUN git config --global url."https://${GITHUB_ACCESS_TOKEN}:@github.com/".insteadOf "<https://github.com/>" \
    && pip install -r requirements.txt \
    && pip install -e .
k
Not pulled by Git storage. It needs to be in the Docker image. The point of those is just to pull the Flow file. Prefect does not install a Git repo into a module, we leave that up to the user because we’d need to make a lot of assumptions and we’d be recreating Python packaging to achieve that (for 1.0). For 2.0, we are looking into simplifying that packaging story. You’d have to manipulate the Python path to get things as a module, but it’s very hard (might be impossible to do it)
So yes the Git repo is pulled but that is for static files (not modules), and then it gets loaded in before the Flow runs, and then the stuff gets deleted. It’s not for Python files, it’s more for YAML or SQL
n
I don’t understand, do you inspect the filenames in the Git repo and selectively pull only non-python files? I’d expect prefect to simply do a git pull on the branch name provided. As for modules, I was hoping that if I install the package in “editable” mode while creating the docker image, then any changes “pulled” from the repo should be immediately reflected without having to install the package again. btw, I just realized there is an issue with my directory structure. i.e. git repo structure is not the same as dir structure in docker image.
In what directory is the git repo pulled?
k
No we clone it. But can’t install it. So when you do
pip install -e .
, you are adding a folder to the Python path. That clone though is happening in a temporary directory so it’s not in the Python path and not the same directory as the previous installation
n
aah… I see. It’s being cloned into a temp dir. hmm…
k
What I have seen someone do is add the clone and install as the ENTRYPOINT for the Docker container. You can try that?
n
hmm… worth a try.
@Kevin Kho I was able to get this working using Entrypoint. Thanks a lot for your suggestion!! Here’s the changes I made (for anyone else searching for this): `Dockerfile`:
Copy code
COPY entry.sh /opt/prefect/
ENTRYPOINT [ "./entry.sh" ]
`entry.sh`:
Copy code
#!/bin/bash

git init
git remote add origin <https://github.com/><your_repo>.git
git fetch
git checkout -t origin/$GITHUB_REPO_BRANCH -f
[[ $REINSTALL_REQUIREMENTS -eq 1 ]] && pip install -r requirements.txt
[[ $REINSTALL_PY_PACKAGE -eq 1 ]] && pip install -e .
exec "$@"
`register_flows.py`: Added
GITHUB_REPO_BRANCH
environment variable to the ECS run.
Copy code
ecs_run = ECSRun(
    //... other settings
    //...
    // Override these env variables to try different code
    env={'GITHUB_REPO_BRANCH': 'trunk', 'REINSTALL_REQUIREMENTS': '0', 'REINSTALL_PY_PACKAGE': '0'},
)
k
Ah nice work!
I documented this in discourse btw for future users
👍 1