m

    Madhup Sukoon

    7 months ago
    Hello! I am extremely new to Prefect, so apologies if this is a rookie question or has already been answered. I have a repository containing a bunch of scripts, folders and modules. I want to import a subset of the functions defined here and build a Flow in a new
    flow.py
    file and then run it via a Docker Agent. I also have some external (pip-installable) dependencies that would need to be present for the flow to run. Is it possible to do this? Does someone have an example Flow that uses Github Storage and Docker Run?
    Konstantin

    Konstantin

    7 months ago
    core_storage = GitLab(
        repo="Number_proj",
        host="<https://link_git>",
        path="flows/blablabla/flow.py",
        ref="master",
        access_token_secret="GITLAB_ACCESS_TOKEN",
        stored_as_script=True
    )
    
    with Flow("name_flow", storage=core_storage) as flow:
    SET GITLAB_ACCESS_TOKEN=""  (in env in Docker container)
    m

    Madhup Sukoon

    7 months ago
    Thanks @Konstantin, could you also show me the DockerRun configuration for this? Also, where should I specify my dependencies?
    Konstantin

    Konstantin

    7 months ago
    FROM python:3.9
    
    ARG PREFECT_VERSION='0.15.1'
    
    RUN apt update && apt install uuid -y
    RUN python -m pip install --upgrade pip && pip install prefect==$PREFECT_VERSION prefect[gitlab] numpy pymysql pymssql prefect psycopg2 requests prefect datetime pandas sqlalchemy
    Anna Geller

    Anna Geller

    7 months ago
    Konstantin’s suggestion about building a Docker image to package your dependencies is really good. If you are looking for an example repository structure, incl. how to package dependencies into a package that can be installed within the Docker image, how to create a Dockerfile and build an image, check out this repository: https://github.com/anna-geller/packaging-prefect-flows/
    e.g. this file shows how you can pass a Dockerfile to a Docker storage and use it with a Docker agent by providing a DockerRun run config with the same label as you assign to the agent e.g.
    prefect agent docker start --label docker
    m

    Madhup Sukoon

    7 months ago
    Ah, so this would require me to create my own docker image. I was wondering if it would be possible to1. have a generic docker image (something based off prefect or python) 2. specify my pip installable dependencies (which would get installed in the docker container) 3. specify my Github repository (which would get cloned into the container) 4. Run this resulting container using a Docker Agent The reason I don't want to have my codebase baked into the docker image is because I foresee a lot of small changes in the codebase, and running a docker build for each small commit in my CI/CD would not be optimal.
    Anna Geller

    Anna Geller

    7 months ago
    Is your repository public? If so, the issue is quite straightforward as you can easily clone public repositories within Dockerfile. But if your repo is private, setting this up may be a bit more involved to handle credentials properly. Either way, for such scenario, you would need to either: • build a Dockerfile incl. a RUN command that would download your repo • or clone the repo e.g. as a first task in your flow similarly to what this post is doing for dbt
    @task(name="Clone DBT repo")
    def pull_dbt_repo(repo_url: str, branch: str = None):
        pygit2.clone_repository(url=repo_url, path=DBT_PROJECT, checkout_branch=branch)
    m

    Madhup Sukoon

    7 months ago
    It's a private repository 😒miling_face_with_tear:
    Anna Geller

    Anna Geller

    7 months ago
    But really if you use Docker storage as described in this file, then your image gets built on flow registration, so it doesn’t necessarily run as part of CI/CD, you can control when you register your flow (and build & push the image)
    m

    Madhup Sukoon

    7 months ago
    Thanks @Anna Geller, looks like this is what I'm gonna have to do. A couple more questions (apologies for my poor understanding of Prefect):1. For the above link, I'm assuming the dockerfile would look exactly like the one in your repo, and the requirements.txt would specify all the external dependencies that are needed. 2. I'm assuming that for every code change that I want to be reflected in the next run, I will have to re-run the flow registration? 3. My docker agent is running on an ARM machine. How do I specify that the docker image has to be built for ARM? (assuming the docker build is triggered automatically through flow registration)
    Anna Geller

    Anna Geller

    7 months ago
    1. Yes 2. Yes 3. You can add the following argument to your Docker storage to specify whether you need amd or arm platform:
    flow.storage = Docker(..., build_kwargs={"platform": "linux/arm64/v8"})  # or: "linux/amd64"})
    m

    Madhup Sukoon

    7 months ago
    Awesome. Thanks a lot for your patience. 😅
    Anna Geller

    Anna Geller

    7 months ago
    Anytime 👍