Hi <@ULVA73B9P> I have an issue with using `GitRe...
# ask-community
p
Hi @Marvin I have an issue with using
GitRepository
as a source and at the same time reading a
yaml
config file to configure the task: The code is pulled correctly from my repo, but I also want the config file to be pulled and read. Any ideas how I could read the config file from git instead of using the local one? I tried using
git_source.destination
as base path, but that path doesn't exist. (Probably because from_flow uses a temp dir that is no longer available?)
m
Your message was too long by 412 tokens - please shorten it and try again. For reference, here's your message at the allowed limit:
Hi
I have an issue with using
GitRepository
as a source and at the same time reading a
yaml
config file to configure the task: The code is pulled correctly from my repo, but I also want the config file to be pulled and read. Any ideas how I could read the config file from git instead of using the local one? I tried using
git_source.destination
as base path, but that path doesn't exist. (Probably because from_flow uses a temp dir that is no longer available?) Here's an excerpt of my code:
prefect_deamon.py
Copy code
from datetime import timedelta

from prefect import flow

from data_collection.environment import PREFECT_DEAMON_SOURCE

if __name__ == "__main__":
    deployment = flow.from_source(
        source=PREFECT_DEAMON_SOURCE,
        entrypoint="prefect_flow.py:download_all",
    )

    # Schedule the deployment to run every day
    deployment.serve(
        name="data-collection-deamon",
        interval=timedelta(hours=24),
    )
prefect_flow.py
```... @flow(flow_run_name="download-all", version=get_git_commit()) async def download_all(config_path: Path =
Your message was too long by 313 tokens - please shorten it and try again. For reference, here's your message at the allowed limit:
Hi
I have an issue with using
GitRepository
as a source and at the same time reading a
yaml
config file to configure the task: The code is pulled correctly from my repo, but I also want the config file to be pulled and read. Any ideas how I could read the config file from git instead of using the local one? I tried using
git_source.destination
as base path, but that path doesn't exist. (Probably because from_flow uses a temp dir that is no longer available?) Here's an excerpt of my code:
prefect_deamon.py
Copy code
from datetime import timedelta

from prefect import flow

from data_collection.environment import PREFECT_DEAMON_SOURCE

if __name__ == "__main__":
    deployment = flow.from_source(
        source=PREFECT_DEAMON_SOURCE,
        entrypoint="prefect_flow.py:download_all",
    )

    # Schedule the deployment to run every day
    deployment.serve(
        name="data-collection-deamon",
        interval=timedelta(hours=24),
    )
prefect_flow.py
```... @flow(flow_run_name="download-all", version=get_git_commit()) async def download_all(config_path: Path =
Your message was too long by 226 tokens - please shorten it and try again. For reference, here's your message at the allowed limit:
Hi
I have an issue with using
GitRepository
as a source and at the same time reading a
yaml
config file to configure the task: The code is pulled correctly from my repo, but I also want the config file to be pulled and read. Any ideas how I could read the config file from git instead of using the local one? I tried using
git_source.destination
as base path, but that path doesn't exist. (Probably because from_flow uses a temp dir that is no longer available?) Here's an excerpt of my code:
prefect_deamon.py
Copy code
from datetime import timedelta

from prefect import flow

from data_collection.environment import PREFECT_DEAMON_SOURCE

if __name__ == "__main__":
    deployment = flow.from_source(
        source=PREFECT_DEAMON_SOURCE,
        entrypoint="prefect_flow.py:download_all",
    )

    # Schedule the deployment to run every day
    deployment.serve(
        name="data-collection-deamon",
        interval=timedelta(hours=24),
    )
prefect_flow.py
```... @flow(flow_run_name="download-all", version=get_git_commit()) async def download_all(config_path: Path =
Your message was too long by 69 tokens - please shorten it and try again. For reference, here's your message at the allowed limit:
Hi
I have an issue with using
GitRepository
as a source and at the same time reading a
yaml
config file to configure the task: The code is pulled correctly from my repo, but I also want the config file to be pulled and read. Any ideas how I could read the config file from git instead of using the local one? I tried using
git_source.destination
as base path, but that path doesn't exist. (Probably because from_flow uses a temp dir that is no longer available?) Here's an excerpt of my code:
prefect_deamon.py
Copy code
from data_collection.environment import PREFECT_DEAMON_SOURCE

if __name__ == "__main__":
    deployment = flow.from_source(
        source=PREFECT_DEAMON_SOURCE,
        entrypoint="prefect_flow.py:download_all",
    )

    # Schedule the deployment to run every day
    deployment.serve(
        name="data-collection-deamon",
        interval=timedelta(hours=24),
    )
prefect_flow.py
```... @flow(flow_run_name="download-all", version=get_git_commit()) async def download_all(config_path: Path = DEFAULT_CONFIG_PATH): logger = get_run_logger
When using
GitRepository
in Prefect as a source for your code, it's important to know how Prefect handles git repositories. Your issue with the temporary directory seems to stem from Prefect using a temporary space when pulling the repository. Indeed, Prefect clones to a temporary directory and then the content is moved as needed, which means any direct path references outside this process could be lost. For Prefect 2.x, the way git repositories are handled involves cloning to a temporary directory, as documented here: Prefect 2.x Git Integration. This method involves managing repository pulls with limited git history to a temporary directory during runtime. In Prefect 3.x, storage and management of flow code are handled with git-based storage options, which support direct URL usage and credential blocks for private repositories, described here: Prefect 3.x Git-Based Storage. However, the handling of the repository content after cloning might still rely on temporary directories unless explicitly managed within your task logic. To read a YAML config file from your Git repository without relying on the ephemeral temporary directory: 1. Fetch the File During Task Execution: - Ensure that your task within the flow explicitly reads the YAML file as part of its logic. This can include pulling the file using a git command or any preferred method right before you intend to use it. 2. Using Prefect Blocks: - Leverage Prefect's capabilities, like Secret blocks or other storage solutions, to manage location and access outside of the temporary directory, configuring access directly via deployments. If your repository is causing you any existential dread like a robot brain, you might also want to check how you've structured the access and use of files post-cloning directly within the task execution context. Feel free to ask for further assistance or clarifications!