Hey all, I have been trying to adapt an existing w...
# ask-community
m
Hey all, I have been trying to adapt an existing workflow that I made by hand into prefect and I think that I've got the wrong idea somewhere in how to do it. All my old code is in separate github repos for different steps in the old workflow, so I figured I could use those repos as "tasks" in prefect. I've also made a repo for the flow that joins the tasks together. I've defined all of the repos as
GitHubRepository
blocks on my server with things like:
Copy code
from prefect_github import GitHubRepository

from prefect.blocks.system import Secret

# Load GitHub token from Prefect storage
github_token = Secret.load("github-token").get()

# Define GitHub storage blocks for each repo
ingestion_repo = GitHubRepository(
    repository_url="my/repo/url.git",
    reference="prefect-dev",
    access_token=github_token
)
ingestion_repo.save("ingestion-repo-dev", overwrite=True)
and can see they've all been saved from the blocks ui. I defined the flow repo within YAML though and uploaded that from the CLI:
Copy code
deployments:

- name: cpu-flow
  entrypoint: flow.py:cpu_workflow
  work_pool:
    name: cpu-process-pool
    work_queue_name:
    job_variables: {}
  pull:
  - prefect.deployments.steps.git_clone:
      repository: my/flow/repo.git
      branch: main
      include_submodules: true
      access_token: '{{ prefect.blocks.secret.github-token }}'
  version:
  tags: []
  concurrency_limit:
  description:
  parameters: {}
  schedules: []
So when I run a deployment, the worker seems to happily clone the flow. It then starts the task code:
Copy code
@task(tags=["cpu"])
async def testSlide(slide: Slide):
    ingestion_repo = await GitHubRepository.load("ingestion-repo-dev")
    await ingestion_repo.get_directory()  
        
    from ingestionMain import slide_test
    return await slide_test(slide)
and fails the clone here - asking me what credentials to use - when I already told it that in the block - the deployment knew about the credentials... Is this a silly way to do it all? The docs all seem to use repos for flows, not tasks. I could migrate over to docker or something else?
n
hi @Martin Klefas - it makes sense to me to use
GitHubRepository
in a setup script before you run
prefect deploy
and to refer to it in
prefect.yaml
, but why are you loading stuff from it again in the
testSlide
task? you should already have it on disk wherever your deployment is running right?
m
Thanks @Nate I'd got the wrong idea a bit and each task was in its own repo so each task needed a clone. I've moved the task repos to be submodule of the flow repo now so everything is pulled all at once
n
ok! in case its helpful i have an example repo here https://github.com/zzstoatzz/prefect-pack and a youtube series here https://www.youtube.com/@natefromprefect5895/playlists
m
Thank you