Thomas Gersch
07/30/2025, 8:10 AMMarvin
07/30/2025, 8:11 AMMarvin
07/30/2025, 8:12 AMbash
prefect work-pool create "my-process-pool" --type process
2. Deployment Configuration
Create a deployment YAML file (e.g., deployment.yaml
) with the following structure:
yaml
deployments:
- name: my-flow-deployment
version: null
tags: []
description: null
schedule: {}
flow_name: my_flow
entrypoint: path/to/flow.py:my_flow
parameters: {}
work_pool:
name: my-process-pool
work_queue_name: default
source:
type: gitlab
repository: <https://gitlab.com/your-username/your-repo>
branch: main
path: path/to/flow.py
job_variables:
env:
PREFECT_PYTHON_PATH: "/path/to/your/venv/bin/python"
working_dir: "/path/to/working/directory"
3. Virtual Environment Setup
You have two main approaches:
a) Pre-created Virtual Environment:
- Create the virtual environment beforehand on the worker machine:
bash
python -m venv /path/to/your/venv
/path/to/your/venv/bin/pip install -r requirements.txt
- Specify this environment in the deployment configuration using PREFECT_PYTHON_PATH
b) Dynamic Virtual Environment:
- You can create a wrapper script that creates and activates a virtual environment before running the flow
- Add this as a before_script
in your deployment configuration
4. Important Considerations:
- The process worker needs access to:
- The GitLab repository (proper credentials/access tokens if private)
- Permission to create/access virtual environments in the specified location
- Python and virtualenv installed on the system
- Sufficient disk space for virtual environments
5. Running the Worker
Start the process worker:
bash
prefect worker start -p "my-process-pool"
6. Deploy Your Flow
Deploy using:
bash
prefect deploy --name my-flow-deployment
Best Practices and Tips:
1. Environment Management:
- Consider using a requirements.txt or setup.py in your GitLab repository
- Use specific version numbers for dependencies to ensure reproducibility
- Consider cleaning up old virtual environments periodically
2. Security Considerations:
- Ensure the process worker has minimal necessary permissions
- Store sensitive credentials in Prefect blocks rather than in code
- Use GitLab deploy tokens or SSH keys for repository access
3. Monitoring:
- Monitor disk usage as virtual environments can take up significant space
- Keep track of virtual environment creation/deletion logs
- Use Prefect's logging capabilities to track environment setup issues
Limitations:
- The process worker executes flows on the same machine where it runs, so ensure adequate resources
- You'll need to manage virtual environment cleanup manually
- Each flow run might require its own virtual environment, which could impact disk space
Would you like me to elaborate on any of these aspects or provide more specific details about any part of the configuration?