Thomas Cobban
08/18/2025, 9:02 PMpip
at runtime (this will occur prior to any pull steps configured on the deployment)."
}
},
"description": "Configuration class used by the managed worker to create a job."
},
"job_configuration": {
"env": "{{ env }}",
"image": "{{ image }}",
"timeout": "{{ timeout }}",
"pip_packages": "{{ pip_packages }}"
}
} (edited)Marvin
08/18/2025, 9:02 PMMarvin
08/18/2025, 9:07 PMdatetime
and ftplib
are part of Python’s standard library. Don’t list them in pip_packages
. They’ll either fail or install unrelated (and potentially breaking) packages.
3) Heavy geospatial dependencies
- rioxarray
pulls in `rasterio`/GDAL/PROJ. Those often need system libraries and/or large wheels. The stock Prefect images don’t include GDAL/PROJ toolchains. Installing these at runtime in a managed container is fragile and slow, and often fails.
4) Working directory
- After git_clone
, Prefect does not automatically cd into the repo. If your entrypoint is relative to the repo root, add a set_working_directory
pull step.
5) Repo URL and PAT
- In YAML, use a plain URL string (no Slack <...>
markup). Example: "<https://github.com/BathyMapsAustralia/jaxa-hourly.git>"
.
- For private repos, the token needs repo
scope and (for org repos) SSO must be enabled for the token.
What I’d change first
- Remove stdlib packages from the work pool’s pip_packages
(drop datetime
, ftplib
).
- Move package installation into a pull step after cloning (or better: use a custom Docker image).
- Add set_working_directory
so the entrypoint resolves.
- Double‑check the entrypoint path exists at the repo root.
Example prefect.yaml
name: jaxa-hourly
pull:
- prefect.deployments.steps.git_clone:
id: clone
repository: "<https://github.com/BathyMapsAustralia/jaxa-hourly.git>"
branch: "main"
credentials: "{{ prefect.blocks.github-credentials.jaxa-github }}"
- prefect.deployments.steps.set_working_directory:
directory: "{{ clone.directory }}"
# Option A: install from requirements after clone (clearer logs than work-pool pip_packages)
- prefect.deployments.steps.utility.pip_install_requirements:
requirements_file: "requirements.txt"
deployments:
- name: jaxa_hourly_prod
entrypoint: prefect_jaxa_hourly.py:jaxa_hourly_flow
work_pool:
name: bathymaps-prefect-managed
parameters: {}
schedules:
- cron: "0 * * * *"
timezone: "Australia/Brisbane"
active: true
If you must keep pip_packages
in the work pool, at least remove datetime
and ftplib
, and pin versions for rioxarray
, rasterio
, xarray
, and numpy
. But the reliable route is a custom image.
Recommended: custom image for geospatial libs
Dockerfile example (Debian-based, installs GDAL/PROJ and Python deps):
FROM prefecthq/prefect:3-python3.11
# System deps for rasterio/GDAL/PROJ (adjust versions as needed)
RUN apt-get update && \
apt-get install -y --no-install-recommends \
gdal-bin libgdal-dev proj-bin libproj-dev \
build-essential && \
rm -rf /var/lib/apt/lists/*
# Python libs
RUN pip install --no-cache-dir \
numpy==1.26.* \
xarray==2024.6.* \
rasterio==1.3.* \
rioxarray==0.15.*
- Push this image to a registry and set your work pool “Image” to it.
- Then you can remove pip_packages
entirely for faster, more stable starts.
Quick isolation steps
- Set work pool pip_packages
to an empty list temporarily. If the run then clones and fails later, we’ve confirmed install was the first failure.Marvin
08/18/2025, 9:07 PMprefect_jaxa_hourly.py
at the root and the flow is named jaxa_hourly_flow
.
- Use the “Run test” button on the work pool; it will show the pip step output right at startup.
- If needed, run locally in the same base image to reproduce installs:
docker run --rm -it prefecthq/prefect-client:3-latest bash
pip install -v rioxarray
You’ll very likely see the GDAL/PROJ errors there.
Docs you’ll want handy
- Managed work pools: https://docs-3.prefect.io/v3/how-to-guides/deployment_infra/managed
- prefect.yaml and pull steps: https://docs-3.prefect.io/v3/how-to-guides/deployments/prefect-yaml
- Pull step API (git_clone, set_working_directory, pip_install_requirements): https://docs-3.prefect.io/v3/api-ref/python/prefect.deployments.steps.pull
If you can share:
- The first 50–100 lines of the job logs from a failing run
- Whether the repo is private and the exact PAT scopes
- The repo structure around prefect_jaxa_hourly.py
…I can point you to the exact failing step. But my bet, with all the sunny optimism of a paranoid android, is on GDAL.