Nikhil Jain

    Nikhil Jain

    3 months ago
    I am trying to use GIT storage with ECS run. The goal is to have a docker container which has all the python packages installed and then just pull the code from github prior to running the flow. This obviously requires that the container should be able to pull code from Github. Then, in
    task-definition.yaml
    I am providing
    GITHUB_ACCESS_TOKEN
    through linux ENVIRONMENT variables. But I am getting this error:
    'The secret GITHUB_ACCESS_TOKEN was not found.  Please ensure that it was set correctly in your tenant: <https://docs.prefect.io/orchestration/concepts/secrets.html>'`
    Am I missing something?
    Kevin Kho

    Kevin Kho

    3 months ago
    I think you are providing
    GITHUB_ACESS_TOKEN
    but for prefect to treat it as a secret, it has to be
    PREFECT__CONTEXT__SECRETS__GITHUB_ACCESS_TOKEN
    ? Are you using Local secrets?
    Nikhil Jain

    Nikhil Jain

    3 months ago
    I have this setup:
    flow.storage = Git(
            repo='coveredinc/taskrunner',
            flow_path=common.get_flow_path(local_file_path),
            repo_host='<http://github.com|github.com>',
            git_token_secret_name='GITHUB_ACCESS_TOKEN',
            branch_name='git-storage',
            add_default_labels=False
        )
    
    flow.run = ECSRun(
        run_task_kwargs={'cluster': f'{env}-prefect-cluster'},
        task_role_arn=f'arn:aws:iam::<>:role/{env}-task-runner-ecs-task-role',
        execution_role_arn=f'arn:aws:iam::<>:role/{env}-task-runner-ecs-task-execution-role',
        image=f'<>.dkr.ecr.{region}.<http://amazonaws.com/{env}-taskrunner-ecr:latest|amazonaws.com/{env}-taskrunner-ecr:latest>',
        task_definition_path=f's3://{env}-prefect-ecs-config/task-definition.yaml',
        labels=['ecs', f'{env}']
    )
    Kevin Kho

    Kevin Kho

    3 months ago
    Yes but Local Secrets or Secrets hosted with Prefect Cloud?
    Nikhil Jain

    Nikhil Jain

    3 months ago
    Sorry, that edit was just to cleanup the main thread. I haven’t setup any secrets in Prefect Cloud so I am guessing it’s a local secret? I was trying out the suggestion you had, it seems git pull worked out fine. Now my next challenge will be to get the code to work (especially fixing “ModuleNotFound” errors).
    Kevin Kho

    Kevin Kho

    3 months ago
    ModuleNotFound is likely you don’t have packages in the image right?
    Nikhil Jain

    Nikhil Jain

    3 months ago
    @Kevin Kho Actually I am not getting ModuleNotFound error. But any updates to the code are not getting reflected in flow run. I think that’s because the code is being “packaged” into a python package. Even if I install it in editable mode, code updates pulled via git are not applied to the package. Any thoughts on getting around this? Is there a way to run some kind of ‘setup’ code after
    git pull
    but before the actual flow code is run?
    Kevin Kho

    Kevin Kho

    3 months ago
    I think you saw my article about packaging a module into Docker right? Taht’s what you tried?
    Nikhil Jain

    Nikhil Jain

    3 months ago
    yes
    Kevin Kho

    Kevin Kho

    3 months ago
    Can you share the Dockerfile?
    Oh other Git files are not pulled, Just the Flow is pulled from Git if that is what you were expecting
    Nikhil Jain

    Nikhil Jain

    3 months ago
    hmm… so if my
    flow.py
    depends on another file
    helper.py
    that would not be pulled? So basically the entire code for the flow has to be contained in one file and no imports or dependencies on anything else? What’s the point of Git/Github storage then?
    However in my case, even the changes to the Flow file are not getting executed. Here’s my
    Dockerfile
    FROM python:3.9
    WORKDIR /opt/prefect
    COPY requirements.txt setup.py pyproject.toml /opt/prefect/
    COPY taskrunner /opt/prefect/taskrunner
    
    ARG GITHUB_ACCESS_TOKEN
    RUN git config --global url."https://${GITHUB_ACCESS_TOKEN}:@github.com/".insteadOf "<https://github.com/>" \
        && pip install -r requirements.txt \
        && pip install -e .
    Kevin Kho

    Kevin Kho

    3 months ago
    Not pulled by Git storage. It needs to be in the Docker image. The point of those is just to pull the Flow file. Prefect does not install a Git repo into a module, we leave that up to the user because we’d need to make a lot of assumptions and we’d be recreating Python packaging to achieve that (for 1.0). For 2.0, we are looking into simplifying that packaging story. You’d have to manipulate the Python path to get things as a module, but it’s very hard (might be impossible to do it)
    So yes the Git repo is pulled but that is for static files (not modules), and then it gets loaded in before the Flow runs, and then the stuff gets deleted. It’s not for Python files, it’s more for YAML or SQL
    Nikhil Jain

    Nikhil Jain

    3 months ago
    I don’t understand, do you inspect the filenames in the Git repo and selectively pull only non-python files? I’d expect prefect to simply do a git pull on the branch name provided. As for modules, I was hoping that if I install the package in “editable” mode while creating the docker image, then any changes “pulled” from the repo should be immediately reflected without having to install the package again. btw, I just realized there is an issue with my directory structure. i.e. git repo structure is not the same as dir structure in docker image.
    In what directory is the git repo pulled?
    Kevin Kho

    Kevin Kho

    3 months ago
    No we clone it. But can’t install it. So when you do
    pip install -e .
    , you are adding a folder to the Python path. That clone though is happening in a temporary directory so it’s not in the Python path and not the same directory as the previous installation
    Nikhil Jain

    Nikhil Jain

    3 months ago
    aah… I see. It’s being cloned into a temp dir. hmm…
    Kevin Kho

    Kevin Kho

    3 months ago
    What I have seen someone do is add the clone and install as the ENTRYPOINT for the Docker container. You can try that?
    Nikhil Jain

    Nikhil Jain

    3 months ago
    hmm… worth a try.
    @Kevin Kho I was able to get this working using Entrypoint. Thanks a lot for your suggestion!! Here’s the changes I made (for anyone else searching for this): Dockerfile:
    COPY entry.sh /opt/prefect/
    ENTRYPOINT [ "./entry.sh" ]
    entry.sh:
    #!/bin/bash
    
    git init
    git remote add origin <https://github.com/><your_repo>.git
    git fetch
    git checkout -t origin/$GITHUB_REPO_BRANCH -f
    [[ $REINSTALL_REQUIREMENTS -eq 1 ]] && pip install -r requirements.txt
    [[ $REINSTALL_PY_PACKAGE -eq 1 ]] && pip install -e .
    exec "$@"
    register_flows.py: Added
    GITHUB_REPO_BRANCH
    environment variable to the ECS run.
    ecs_run = ECSRun(
        //... other settings
        //...
        // Override these env variables to try different code
        env={'GITHUB_REPO_BRANCH': 'trunk', 'REINSTALL_REQUIREMENTS': '0', 'REINSTALL_PY_PACKAGE': '0'},
    )
    Kevin Kho

    Kevin Kho

    3 months ago
    Ah nice work!
    I documented this in discourse btw for future users