Patrick Düggelin
12/05/2024, 9:50 AMGitRepository
storage in my Prefect deployment and could use some help. Here’s the setup:
• Context: I’m trying to set up GitRepository
storage to fetch flow definitions from a GitLab repository as described in the docs.
• Problem: The storage setup fails to pull the repository correctly. The first GitRepository.pull_code
is successful, because it clones the repo. All subsequent pulls fail with the following error:
ERROR | prefect.runner.storage.git-repository.data-collection - Failed to pull latest changes with exit code Command '['git', 'pull', 'origin', '--depth', '1']' returned non-zero exit status 128
• What I’ve Tried: I’ve tried manually running git clone <repo> --depth 1
and making some changes, then pulling again with git pull origin --depth 1
. This also fails with the following exception: fatal: refusing to merge unrelated histories
. I’ve tried setting git config --global pull.rebase true
. But that doesn’t help.
How can I configure git to work with the prefect GitRepository
storage?
I’d appreciate any insights or guidance on what might be going wrong. Let me know if you need additional details or logs!
Thanks in advance for your support! 🙏Patrick Düggelin
12/05/2024, 10:04 AMGitRepository
:
prefect_environment.py
import os
import subprocess
from pathlib import Path
from prefect.runner.storage import GitRepository
def _get_project_source():
return Path(__file__).parent.parent.parent
def get_default_config():
relative_config = "resources/sources/config.yml"
return _get_project_source() / relative_config
def get_deamon_source():
# Running on a server.
repo = os.environ.get("GITLAB_REPO")
if repo:
access_token = os.environ.get("GITLAB_ACCESS_TOKEN")
credentials = None
if access_token:
credentials = {"access_token": access_token}
repo = GitRepository(repo, credentials)
return repo
else:
# Running locally.
return str(_get_project_source())
def get_git_commit():
# Check if the directory is a valid git repository
try:
commit_hash = subprocess.check_output(
["git", "-C", _get_project_source(), "rev-parse", "--short", "HEAD"],
text=True,
).strip()
return commit_hash
except subprocess.CalledProcessError:
print(f"Could not retrieve git commit for directory: {_get_project_source()}")
return None
prefect_deamon.py
from datetime import timedelta
from prefect import flow
from data_collection.prefect_environment import get_deamon_source
if __name__ == "__main__":
deployment = flow.from_source(
source=get_deamon_source(),
entrypoint="src/data_collection/prefect_flow.py:download_all",
)
# Schedule the deployment to run every day
deployment.serve(
name="data-collection-deamon",
interval=timedelta(hours=24),
)