Patrick Düggelin
12/03/2024, 12:38 PMGitRepository
as a source and at the same time reading a yaml
config file to configure the task:
The code is pulled correctly from my repo, but I also want the config file to be pulled and read.
Any ideas how I could read the config file from git instead of using the local one?
I tried using git_source.destination
as base path, but that path doesn't exist. (Probably because from_flow uses a temp dir that is no longer available?)Marvin
12/03/2024, 12:38 PMHiI have an issue with using
GitRepository
as a source and at the same time reading a yaml
config file to configure the task:
The code is pulled correctly from my repo, but I also want the config file to be pulled and read.
Any ideas how I could read the config file from git instead of using the local one?
I tried using git_source.destination
as base path, but that path doesn't exist. (Probably because from_flow uses a temp dir that is no longer available?)
Here's an excerpt of my code:
prefect_deamon.py
from datetime import timedelta
from prefect import flow
from data_collection.environment import PREFECT_DEAMON_SOURCE
if __name__ == "__main__":
deployment = flow.from_source(
source=PREFECT_DEAMON_SOURCE,
entrypoint="prefect_flow.py:download_all",
)
# Schedule the deployment to run every day
deployment.serve(
name="data-collection-deamon",
interval=timedelta(hours=24),
)
prefect_flow.py
```...
@flow(flow_run_name="download-all", version=get_git_commit())
async def download_all(config_path: Path =Marvin
12/03/2024, 12:39 PMHiI have an issue with using
GitRepository
as a source and at the same time reading a yaml
config file to configure the task:
The code is pulled correctly from my repo, but I also want the config file to be pulled and read.
Any ideas how I could read the config file from git instead of using the local one?
I tried using git_source.destination
as base path, but that path doesn't exist. (Probably because from_flow uses a temp dir that is no longer available?)
Here's an excerpt of my code:
prefect_deamon.py
from datetime import timedelta
from prefect import flow
from data_collection.environment import PREFECT_DEAMON_SOURCE
if __name__ == "__main__":
deployment = flow.from_source(
source=PREFECT_DEAMON_SOURCE,
entrypoint="prefect_flow.py:download_all",
)
# Schedule the deployment to run every day
deployment.serve(
name="data-collection-deamon",
interval=timedelta(hours=24),
)
prefect_flow.py
```...
@flow(flow_run_name="download-all", version=get_git_commit())
async def download_all(config_path: Path =Marvin
12/03/2024, 12:40 PMHiI have an issue with using
GitRepository
as a source and at the same time reading a yaml
config file to configure the task:
The code is pulled correctly from my repo, but I also want the config file to be pulled and read.
Any ideas how I could read the config file from git instead of using the local one?
I tried using git_source.destination
as base path, but that path doesn't exist. (Probably because from_flow uses a temp dir that is no longer available?)
Here's an excerpt of my code:
prefect_deamon.py
from datetime import timedelta
from prefect import flow
from data_collection.environment import PREFECT_DEAMON_SOURCE
if __name__ == "__main__":
deployment = flow.from_source(
source=PREFECT_DEAMON_SOURCE,
entrypoint="prefect_flow.py:download_all",
)
# Schedule the deployment to run every day
deployment.serve(
name="data-collection-deamon",
interval=timedelta(hours=24),
)
prefect_flow.py
```...
@flow(flow_run_name="download-all", version=get_git_commit())
async def download_all(config_path: Path =Marvin
12/03/2024, 12:40 PMHiI have an issue with using
GitRepository
as a source and at the same time reading a yaml
config file to configure the task:
The code is pulled correctly from my repo, but I also want the config file to be pulled and read.
Any ideas how I could read the config file from git instead of using the local one?
I tried using git_source.destination
as base path, but that path doesn't exist. (Probably because from_flow uses a temp dir that is no longer available?)
Here's an excerpt of my code:
prefect_deamon.py
from data_collection.environment import PREFECT_DEAMON_SOURCE
if __name__ == "__main__":
deployment = flow.from_source(
source=PREFECT_DEAMON_SOURCE,
entrypoint="prefect_flow.py:download_all",
)
# Schedule the deployment to run every day
deployment.serve(
name="data-collection-deamon",
interval=timedelta(hours=24),
)
prefect_flow.py
```...
@flow(flow_run_name="download-all", version=get_git_commit())
async def download_all(config_path: Path = DEFAULT_CONFIG_PATH):
logger = get_run_loggerMarvin
12/03/2024, 12:41 PMGitRepository
in Prefect as a source for your code, it's important to know how Prefect handles git repositories. Your issue with the temporary directory seems to stem from Prefect using a temporary space when pulling the repository. Indeed, Prefect clones to a temporary directory and then the content is moved as needed, which means any direct path references outside this process could be lost.
For Prefect 2.x, the way git repositories are handled involves cloning to a temporary directory, as documented here: Prefect 2.x Git Integration. This method involves managing repository pulls with limited git history to a temporary directory during runtime.
In Prefect 3.x, storage and management of flow code are handled with git-based storage options, which support direct URL usage and credential blocks for private repositories, described here: Prefect 3.x Git-Based Storage. However, the handling of the repository content after cloning might still rely on temporary directories unless explicitly managed within your task logic.
To read a YAML config file from your Git repository without relying on the ephemeral temporary directory:
1. Fetch the File During Task Execution:
- Ensure that your task within the flow explicitly reads the YAML file as part of its logic. This can include pulling the file using a git command or any preferred method right before you intend to use it.
2. Using Prefect Blocks:
- Leverage Prefect's capabilities, like Secret blocks or other storage solutions, to manage location and access outside of the temporary directory, configuring access directly via deployments.
If your repository is causing you any existential dread like a robot brain, you might also want to check how you've structured the access and use of files post-cloning directly within the task execution context.
Feel free to ask for further assistance or clarifications!