Hello! I am extremely new to Prefect, so apologies...
# ask-community
m
Hello! I am extremely new to Prefect, so apologies if this is a rookie question or has already been answered. I have a repository containing a bunch of scripts, folders and modules. I want to import a subset of the functions defined here and build a Flow in a new
flow.py
file and then run it via a Docker Agent. I also have some external (pip-installable) dependencies that would need to be present for the flow to run. Is it possible to do this? Does someone have an example Flow that uses Github Storage and Docker Run?
k
Copy code
core_storage = GitLab(
    repo="Number_proj",
    host="<https://link_git>",
    path="flows/blablabla/flow.py",
    ref="master",
    access_token_secret="GITLAB_ACCESS_TOKEN",
    stored_as_script=True
)

with Flow("name_flow", storage=core_storage) as flow:
Copy code
SET GITLAB_ACCESS_TOKEN=""  (in env in Docker container)
m
Thanks @Konstantin, could you also show me the DockerRun configuration for this? Also, where should I specify my dependencies?
k
Copy code
FROM python:3.9

ARG PREFECT_VERSION='0.15.1'

RUN apt update && apt install uuid -y
RUN python -m pip install --upgrade pip && pip install prefect==$PREFECT_VERSION prefect[gitlab] numpy pymysql pymssql prefect psycopg2 requests prefect datetime pandas sqlalchemy
a
Konstantin’s suggestion about building a Docker image to package your dependencies is really good. If you are looking for an example repository structure, incl. how to package dependencies into a package that can be installed within the Docker image, how to create a Dockerfile and build an image, check out this repository: https://github.com/anna-geller/packaging-prefect-flows/
e.g. this file shows how you can pass a Dockerfile to a Docker storage and use it with a Docker agent by providing a DockerRun run config with the same label as you assign to the agent e.g.
Copy code
prefect agent docker start --label docker
m
Ah, so this would require me to create my own docker image. I was wondering if it would be possible to 1. have a generic docker image (something based off prefect or python) 2. specify my pip installable dependencies (which would get installed in the docker container) 3. specify my Github repository (which would get cloned into the container) 4. Run this resulting container using a Docker Agent The reason I don't want to have my codebase baked into the docker image is because I foresee a lot of small changes in the codebase, and running a docker build for each small commit in my CI/CD would not be optimal.
a
Is your repository public? If so, the issue is quite straightforward as you can easily clone public repositories within Dockerfile. But if your repo is private, setting this up may be a bit more involved to handle credentials properly. Either way, for such scenario, you would need to either: • build a Dockerfile incl. a RUN command that would download your repo • or clone the repo e.g. as a first task in your flow similarly to what this post is doing for dbt
Copy code
@task(name="Clone DBT repo")
def pull_dbt_repo(repo_url: str, branch: str = None):
    pygit2.clone_repository(url=repo_url, path=DBT_PROJECT, checkout_branch=branch)
m
It's a private repository 🥲
a
But really if you use Docker storage as described in this file, then your image gets built on flow registration, so it doesn’t necessarily run as part of CI/CD, you can control when you register your flow (and build & push the image)
m
Thanks @Anna Geller, looks like this is what I'm gonna have to do. A couple more questions (apologies for my poor understanding of Prefect): 1. For the above link, I'm assuming the dockerfile would look exactly like the one in your repo, and the requirements.txt would specify all the external dependencies that are needed. 2. I'm assuming that for every code change that I want to be reflected in the next run, I will have to re-run the flow registration? 3. My docker agent is running on an ARM machine. How do I specify that the docker image has to be built for ARM? (assuming the docker build is triggered automatically through flow registration)
a
1. Yes 2. Yes 3. You can add the following argument to your Docker storage to specify whether you need amd or arm platform:
Copy code
flow.storage = Docker(..., build_kwargs={"platform": "linux/arm64/v8"})  # or: "linux/amd64"})
m
Awesome. Thanks a lot for your patience. 😅
a
Anytime 👍