morten
08/28/2023, 10:28 AMmorten
08/28/2023, 10:29 AM# Install he prefect image and login.
FROM prefecthq/prefect:2-python3.11 AS builder
# Cant start worker from first stage, as CMD will override all CMD's in the original prefect image.
# Therefore, a second stage is added, where a CMD can be freely added without consequences for previous
# CMD's.
FROM builder AS starter
# RUN pip install poetry
CMD prefect worker start --pool <worker pool>
compose.yaml
services:
worker:
build:
context: .
env_file:
- .env
container_name: <container name>
restart: unless-stopped
.env
PREFECT_API_URL=<key>
PREFECT_API_URL=<https://api.prefect.cloud/api/accounts/><ACCOUNT-ID>/workspaces/<WORKSPACE-ID>
My projects are backed up on BitBucket, so each script would be downloaded as a part of the pull step in the deployment process and they would be installed in their respective virtual environment using poetry.
This was the simplest setup I could think of, which lived up to my three goals.
There were some problems with this setup which ultimately is since Prefect is not made to work with virtual environments - it is targeted toward more controlable decentralized environments.
## Attempt 2
One option for another attempt, would be to create a full docker image of the project, upload to somewhere (additional service) and pull it during deployment. This would be a borderline violation of goal 2.
Since I could get a dockerized worker with a requirements.txt install in the pull step working, and I am somewhat comfortable with shell script, I chose another solution. I would create a worker and wook-pool (is this necessary?) for each project - this is quite some overhead, but it only has to be done once.
I created a runner.sh to control the dockerized workers, which looks like this:
project=$1
process=$2
build_cmd="sudo docker compose build"
run_cmd="sudo docker compose up -d"
if [ "$project" = "ProjectA" ]; then
directory="ProjectA/"
elif [ "$project" = "ProjectB" ]; then
directory="ProjectB/"
elif [ "$project" = "ProjectC" ]; then
directory="ProjectC/"
else
echo "could not find project, options are:"
echo "- ProjectA"
echo "- ProjectB"
echo "- ProjectC"
exit 1
fi
if [ "$process" = "build" ]; then
cd $directory && $build_cmd
elif [ "$process" = "up" ]; then
cd $directory && $run_cmd
elif [ "$process" = "" ] || [ "$process" = "all" ]; then
cd $directory && $build_cmd && $run_cmd
else
echo "could not identify process, options are:"
echo "- all, <empty> builds and runs image"
echo "- build builds image"
echo "- up runs an image"
exit 1
fi
The file structure is simple, runner.sh is in the root directory, with the .env file, and there is a folder for each project (read: worker).
ļ .
āāā ļ ProjectA
ā āāā ī compose.yaml
ā āāā ļ Dockerfile
āāā ļ ProjectB
ā āāā ī compose.yaml
ā āāā ļ Dockerfile
āāā ļ ProjectC
ā āāā ī compose.yaml
ā āāā ļ Dockerfile
āāā ī README.md
āāā ļ runner.sh
āāā ļ .env
Note the the Dockerfile and compose.yaml also needs some changes, to reflect that the location of the .env file has changed to ../.env.
This could be made with pretty with Click if you are not into shell scripts.
The prefect.yaml file for each project as quite simple:
# Welcome to your prefect.yaml file! You can you this file for storing and managing
# configuration for deploying your flows. We recommend committing this file to source
# control along with your flow code.
# Generic metadata about this project
name: <name>
prefect-version: 2.11.0
# build section allows you to manage and build docker images
build: null
# push section allows you to manage if and how this project is uploaded to remote locations
push: null
# pull section allows you to provide instructions for cloning this project in remote locations
pull:
- prefect.deployments.steps.git_clone:
repository: <repo>
# if necessary
access_token: "{{ prefect.blocks.secret.mytoken }}"
- prefect.deployments.steps.pip_install_requirements:
directory: "{{ clone-step.directory }}"
requirements_file: requirements.txt
stream_output: False
# the deployments section allows you to provide configuration for deploying flows
deployments:
- name: <name>
version: null
tags: []
description: null
entrypoint: <entrypoint>
parameters: {}
work_pool:
name: <work pool>
work_queue_name: null
job_variables: {}
schedule: null
This solution effectively creates a docker environment for each project, without having yet another service to learn - all of this can be maintained with a repo and docker. This is not best practice for any professional environment, but as this is something I do in my sparetime and it is mostly the vehicle to what I really like: data science.
I do not have my repositories available to the public, but if you need help try and reach out, and I will be happy to help.