https://prefect.io logo
m

morten

08/28/2023, 10:28 AM
Hi, I have been been working a bit on and off on getting prefect to work at home on a remote server with an otherwise minimal set up. In my reply I have posted a note on my experience. I dont have anywhere to post it properly, so its just pasted in here as text. Wall of text in the reply šŸ™‚
šŸ™Œ 4
prefect build 5
šŸ¦œ 2
# Goal with my setup ## Setup I had an old laptop that I wanted to reuse to run various scripts, before all this I used cronjobs and https://healthchecks.io to message me on telegram when something went wrong - I mention this, as it may be enough for some very simple setups out there. I wanted more control, checks and balances for my scripts, so Prefect it is. I am using this for personal projects, that are not vital and the only one to do anything is myself, meaning I do not have a ton of time to learn, set up, and maintain many new services. I only run Python scripts and the data I work with does generally not have to be backed up. Examples of my projects would be checking for deals on various websites or downloading NBA data, so I can run an algorithm (yet another script) to notify me if a game seems interesting so that I can watch it. ## Goals My first goal is to find a single service, which would help with scheduling, logging errors, retries, etc: 1. Feature Rich Script Runner Since it would be used for personal projects, where I would spend my spare time setting it up and maintaining it, I want to keep additional services I would need to learn at an absolute minimum. My main interest is not actually workflow orchestration, but data science, so the second goal is: 2. Services Minimization As some projects would have different requirements I would need to somehow containerize my scripts 3. Containerize Projects # Solution ## Attempt 1 Goal 1 was fulfilled by choosing Prefect Cloud, this meant that I "only" had to start the worker on my server and keep it running. To containerize it, I would initialize the worker in a docker environment. As docker is somewhat new to me, this would be the only use of docker. The advantage of this would be that I will only have to set this up once, and not touch docker again -- until something breaks. The code to start the docker and worker is shown below. Dockerfile
Copy code
# Install he prefect image and login.
FROM prefecthq/prefect:2-python3.11 AS builder

# Cant start worker from first stage, as CMD will override all CMD's in the original prefect image.
# Therefore, a second stage is added, where a CMD can be freely added without consequences for previous
# CMD's.

FROM builder AS starter

# RUN pip install poetry
CMD prefect worker start --pool <worker pool>
compose.yaml
Copy code
services:
  worker:
    build:
      context: .
    env_file:
      - .env
    container_name: <container name>
    restart: unless-stopped
.env
Copy code
PREFECT_API_URL=<key>
PREFECT_API_URL=<https://api.prefect.cloud/api/accounts/><ACCOUNT-ID>/workspaces/<WORKSPACE-ID>
My projects are backed up on BitBucket, so each script would be downloaded as a part of the pull step in the deployment process and they would be installed in their respective virtual environment using poetry. This was the simplest setup I could think of, which lived up to my three goals. There were some problems with this setup which ultimately is since Prefect is not made to work with virtual environments - it is targeted toward more controlable decentralized environments. ## Attempt 2 One option for another attempt, would be to create a full docker image of the project, upload to somewhere (additional service) and pull it during deployment. This would be a borderline violation of goal 2. Since I could get a dockerized worker with a requirements.txt install in the pull step working, and I am somewhat comfortable with shell script, I chose another solution. I would create a worker and wook-pool (is this necessary?) for each project - this is quite some overhead, but it only has to be done once. I created a runner.sh to control the dockerized workers, which looks like this:
Copy code
project=$1
process=$2

build_cmd="sudo docker compose build"
run_cmd="sudo docker compose up -d"

if [ "$project" = "ProjectA" ]; then
    directory="ProjectA/"

elif [ "$project" = "ProjectB" ]; then
    directory="ProjectB/"

elif [ "$project" = "ProjectC" ]; then
    directory="ProjectC/"

else
    echo "could not find project, options are:"
    echo "- ProjectA"
    echo "- ProjectB"
    echo "- ProjectC"
    exit 1
fi

if [ "$process" = "build" ]; then
    cd $directory && $build_cmd

elif [ "$process" = "up" ]; then
    cd $directory && $run_cmd

elif [ "$process" = "" ] || [ "$process" = "all" ]; then
    cd $directory && $build_cmd && $run_cmd

else
    echo "could not identify process, options are:"
    echo "- all, <empty>    builds and runs image"
    echo "- build           builds image"
    echo "- up              runs an image"
    exit 1
fi
The file structure is simple, runner.sh is in the root directory, with the .env file, and there is a folder for each project (read: worker).
Copy code
ļ„• .
ā”œā”€ā”€ ļ„• ProjectA
ā”‚   ā”œā”€ā”€ ī˜‹ compose.yaml
ā”‚   ā””ā”€ā”€ ļŒˆ Dockerfile
ā”œā”€ā”€ ļ„• ProjectB
ā”‚   ā”œā”€ā”€ ī˜‹ compose.yaml
ā”‚   ā””ā”€ā”€ ļŒˆ Dockerfile
ā”œā”€ā”€ ļ„• ProjectC
ā”‚   ā”œā”€ā”€ ī˜‹ compose.yaml
ā”‚   ā””ā”€ā”€ ļŒˆ Dockerfile
ā”œā”€ā”€ ī˜‰ README.md
ā”œā”€ā”€ ļ’‰ runner.sh
ā””ā”€ā”€ ļ’‰ .env
Note the the Dockerfile and compose.yaml also needs some changes, to reflect that the location of the .env file has changed to ../.env. This could be made with pretty with Click if you are not into shell scripts. The prefect.yaml file for each project as quite simple:
Copy code
# Welcome to your prefect.yaml file! You can you this file for storing and managing
# configuration for deploying your flows. We recommend committing this file to source
# control along with your flow code.

# Generic metadata about this project
name: <name>
prefect-version: 2.11.0

# build section allows you to manage and build docker images
build: null

# push section allows you to manage if and how this project is uploaded to remote locations
push: null

# pull section allows you to provide instructions for cloning this project in remote locations
pull:
  - prefect.deployments.steps.git_clone:
    repository: <repo>
	# if necessary
	access_token: "{{ prefect.blocks.secret.mytoken }}"

  - prefect.deployments.steps.pip_install_requirements:
    directory: "{{ clone-step.directory }}"
    requirements_file: requirements.txt
    stream_output: False

# the deployments section allows you to provide configuration for deploying flows
deployments:
  - name: <name>
  version: null
  tags: []
  description: null
  entrypoint: <entrypoint>
  parameters: {}
  work_pool:
    name: <work pool>
    work_queue_name: null
    job_variables: {}
  schedule: null
This solution effectively creates a docker environment for each project, without having yet another service to learn - all of this can be maintained with a repo and docker. This is not best practice for any professional environment, but as this is something I do in my sparetime and it is mostly the vehicle to what I really like: data science. I do not have my repositories available to the public, but if you need help try and reach out, and I will be happy to help.