https://prefect.io logo
r

Ronald Sam

07/11/2023, 5:48 PM
in the prefect.yaml file I specify the pull step to be a git clone, if I try deploying more than once, on the second time I get an error because the repo already exists in the working directory. Is there a way to specify to not clone but do a git pull from remote branch if the repo already exists?
c

Christopher Boyd

07/11/2023, 6:10 PM
Is this specifically when you are deploying, or when you are trying to run the deployed flows? Is this using a local configuration for both (deploy + exection), or some other way? I haven’t seen this before - you can use the
run_shell_script
and
set_working_dir
deployment.steps to alter the behavior from a git clone, but there isn’t one natively otherwise
r

Ronald Sam

07/11/2023, 6:32 PM
hi @Christopher Boyd, this is specifically when we're deploying
and I'm using the a seperate environmetn for deploy and execution
it's strange there isn't a way to do a git pull after a git clone is performed in subsequent deploys @Taylor Curran
unless I'm doing something unusual
c

Christopher Boyd

07/11/2023, 6:34 PM
do you have an example of the steps you’re taking to run / do this? It’s odd that it happens when you’re deploying - a deploy shouldn’t be doing a git pull
for example, I have the following prefect.yaml - I am cloning teh flow from my repo:
Copy code
cat prefect.yaml
# Welcome to your prefect.yaml file! You can you this file for storing and managing
# configuration for deploying your flows. We recommend committing this file to source
# control along with your flow code.

# Generic metadata about this project
name: path_and_entry_flow
prefect-version: 2.10.18

# build section allows you to manage and build docker images
build: null

# push section allows you to manage if and how this project is uploaded to remote locations
push: null

# pull section allows you to provide instructions for cloning this project in remote locations
pull:
- prefect.deployments.steps.git_clone:
    repository: <https://github.com/><repo>/Samples.git
    branch: main
    access_token: null
- prefect.deployments.steps.run_shell_script:
    id: test
    script: ls -l
    stream_output: true

# the deployments section allows you to provide configuration for deploying flows
deployments:
- name: communitypath
  description: null
  flow_name: null
  entrypoint: ./Prefect/hello_world.py:hello_world
  parameters: {}
  work_pool:
    name: kubernetes
    work_queue_name: null
    job_variables:
      image: chaboy/test:community
I can deploy it repeatedly:
Copy code
(prefect2) (base)  christopherboyd@Christophers-MacBook-Pro  ~/prefect_flows/path_and_entry_flow 
 $ vim prefect.yaml
(prefect2) (base)  christopherboyd@Christophers-MacBook-Pro  ~/prefect_flows/path_and_entry_flow 
 $ prefect deploy --all --ci
The `--ci` flag has been deprecated. It will not be available after Dec 2023. Please use the global
`--no-prompt` flag instead: `prefect --no-prompt deploy`.
? Would you like to build a custom Docker image for this deployment? [y/n] (n): n
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Deployment 'hello-world/communitypath' successfully created with id '1c929b72-3b3a-4511-8181-fd9bce8a1263'.  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────╯


To execute flow runs from this deployment, start a worker in a separate terminal that pulls work from the
'kubernetes' work pool:

        $ prefect worker start --pool 'kubernetes'

To schedule a run for this deployment, use the following command:

        $ prefect deployment run 'hello-world/communitypath'
Copy code
(prefect2) (base)  christopherboyd@Christophers-MacBook-Pro  ~/prefect_flows/path_and_entry_flow 
 $ prefect deploy --all --ci
The `--ci` flag has been deprecated. It will not be available after Dec 2023. Please use the global
`--no-prompt` flag instead: `prefect --no-prompt deploy`.
? Would you like to build a custom Docker image for this deployment? [y/n] (n): n
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Deployment 'hello-world/communitypath' successfully created with id '1c929b72-3b3a-4511-8181-fd9bce8a1263'.  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Copy code
To execute flow runs from this deployment, start a worker in a separate terminal that pulls work from the
'kubernetes' work pool:

        $ prefect worker start --pool 'kubernetes'

To schedule a run for this deployment, use the following command:

        $ prefect deployment run 'hello-world/communitypath'
that repo isn’t even cloned or local
the pull step only happens in the container at code execution
r

Ronald Sam

07/11/2023, 6:39 PM
thanks for helping Christopher. I'll provide my steps soon
I'm actually not using containers but deploying to ec2 instances using gitlab
I run this command to deploy... prefect deploy .\AppScripts\PrefectDemo\demo.py:my_favorite_function
this is my prefect.yaml file
# Welcome to your prefect.yaml file! You can you this file for storing and managing
# configuration for deploying your flows. We recommend committing this file to source # control along with your flow code. # Generic metadata about this project name: PrefectDemo prefect-version: 2.10.20 # build section allows you to manage and build docker images build: null # push section allows you to manage if and how this project is uploaded to remote locations push: null # pull section allows you to provide instructions for cloning this project in remote locations pull: - prefect.deployments.steps.git_clone: repository: https://gitlab.com/.../DevOps.git branch: PrefectDemo credentials: "{{ prefect.blocks.gitlab-credentials.rs-git-credentials }}" # the deployments section allows you to provide configuration for deploying flows deployments: - name: test_2 version: null tags: [] description: null schedule: {} flow_name: null entrypoint: .\AppScripts\PrefectDemo\demo.py:my_favorite_function parameters: {} work_pool: name: aws-dev-work-pool work_queue_name: null job_variables: {}
when I run this commd to run the deployment it performs a git clone every time which fails on the subsequent runs because the git directory already exists in my execution environments working directory.. prefect deployment run 'my-favorite-function/test_2'
c

Christopher Boyd

07/11/2023, 6:42 PM
let me upgrade and make sure we are matched versions, because mine does not run a git clone, so that’s odd
r

Ronald Sam

07/11/2023, 6:44 PM
it's this pull step that does the git clone pull: - prefect.deployments.steps.git_clone: repository: https://gitlab.com/.../DevOps.git branch: PrefectDemo credentials: "{{ prefect.blocks.gitlab-credentials.rs-git-credentials }}"
i think it was a recent update in the latest prefect version
c

Christopher Boyd

07/11/2023, 6:49 PM
I’m still not seeing that to be the case
That pull step is only ran on execution - I don’t even have that repository on my system, I just mimicked the path for the entrypoint
E.g, my entrypoint of
./Prefect/hello_world:hello_world
- that is where my flow lives within the repository, but I just have a local folder structure like that, no repo
there is no git clone happening until it’s executed
r

Ronald Sam

07/11/2023, 7:04 PM
interesting for me everytime I run prefect deployment run "my flow" it does a git clone
in my execution environment
what's your process flow if you update your flow code and push to a feature branch? Won't your execution environment have to perform a git pull to get the most recent changes?
I get this error..
c

Christopher Boyd

07/11/2023, 7:12 PM
so I think these are different things right?
prefect deploy
is how you register your flow / deployment
prefect deployment run
(either through CLI, or from UI) triggers a flow run, which is execution, and does a git clone. Even if I ran this locally, the local agent creates a temporary folder and does a git clone in the temp folder, so I could re-execute 100 times locally and never have it clash
r

Ronald Sam

07/11/2023, 7:12 PM
ah I see, that's where the gap was, I preset the working directory to D:\Prefect
so it only gets cloned there which is why it clashes
1
c

Christopher Boyd

07/11/2023, 7:13 PM
That would definitely do it
I use mac, but when I run a flow run, the agent creates a folder like /var/tmp/123412ioufas0cvs1/prefect
where that string is always random
so 100 flow runs local create 100 random temp folders
r

Ronald Sam

07/11/2023, 7:14 PM
and does each temp folder have a copy of your repo?
c

Christopher Boyd

07/11/2023, 7:14 PM
I can test that for you now, this example was using k8s, but I’ll test
r

Ronald Sam

07/11/2023, 7:14 PM
I tried leaving the working directory blank for prefect to place it in a temp folder, but I got an error in windows
thanks
I just don't want so many copies of my repo
I'll also give you the error message I get when I don't preset the working directory
nvm not preseting the working directory works fine now
must've been something else that was causing the issue that I fixed already
c

Christopher Boyd

07/11/2023, 7:19 PM
I was just about to paste in my output, but yea, I just registered a deployment (locally) from a github repo, started my local worker, ran 5 deployments (local) no clone conflicts; I think you do bring a valid point of if we can maybe alter behavior or have a fallback perhaps (like git pull vs git clone), so i can rease that
raise*
I think that’s a fair idea
r

Ronald Sam

07/11/2023, 7:20 PM
but interesting I guess it wasn't made with preseting working directory and gitlab clone in made
mind*
that would be great, thanks Christopher
and for all the help!
c

Christopher Boyd

07/11/2023, 7:20 PM
quick question, how were you setting your workdir?
that’s not a field for the worker is it?
r

Ronald Sam

07/11/2023, 7:21 PM
I did it from the UI, I went to the worker and edited the working dir
👀 1
image.png
c

Christopher Boyd

07/11/2023, 7:22 PM
oh
wow
r

Ronald Sam

07/11/2023, 7:22 PM
this is the nav
c

Christopher Boyd

07/11/2023, 7:22 PM
I didn’t really know you could do that, hrmm
🙂 1
Yea, I’ll raise a feature request
r

Ronald Sam

07/11/2023, 7:23 PM
thank you!
@Christopher Boyd, working with it more I think I'm okay with execution doing a git clone in a temp dir, however, I would like to specify where these temp folders are created, my C: drive is really full and would prefer it to be created in the d drive instead but am currently not able to do so because of the issue with git clone not working in more than one run when specifying the working directory
Also, last question, is there a way for me to specify a pull step depending on the work-pool
c

Christopher Boyd

07/11/2023, 7:38 PM
regarding the first, I’m not sure I’d have to research. We are predominantly *nix based and I don’t have strong familiarity with Windows, but possibly? Regarding the second - ideally you would have separate deployments configured for each like: https://docs.prefect.io/latest/concepts/deployments-ux/#reusing-configuration-across-deployments
So you have one pull step for multiple deployments , but not multiple pull steps (within the same prefect.yaml) I don’t think, but I’m not 100% on that particular case
maybe it’s possible, I haven’t tried that
r

Ronald Sam

07/11/2023, 7:40 PM
yeah I'd like different pull steps depending on the work-pool its deployed to
for example, I have 3 env 1. dev 2. qa 3. prod
c

Christopher Boyd

07/11/2023, 7:40 PM
I think they would be separate deployments / prefect.yaml entirely then
you would have a separate prefect.yaml because presumably the prefect.yaml is tied to the repo / branch itself
r

Ronald Sam

07/11/2023, 7:41 PM
I see, is it possible to have 3 separate prefect yaml file in the root directory of my repo?
oh gotcha
c

Christopher Boyd

07/11/2023, 7:41 PM
yea, you could use the branch feature too
r

Ronald Sam

07/11/2023, 7:42 PM
But once I merge the feature branch to main, there would be already a prefect yaml file at the root of the repo
I’d have to think about it - if you could maybe spell it / detail what that use case looks like for you, or the ideal scenario I can see if it’s possible
r

Ronald Sam

07/11/2023, 7:43 PM
sounds good
Hi @Christopher Boyd, here's the scenario/use case
We have 4 enviornments: dev, stg, qa, prod and we’re using gitlab ci/cd to perform deployments to stg, qa, prod. For dev it’s a playground environment for our developers as a result, they’ll perform their own deployments to the environment. Also, we have python, nodejs, sql, SSIS, batch scripts and etc. used for our data pipelines as a result, we’ll continue to need and use gitlab ci/cd to perform our deployments to stg,qa,prod. For dev I’m thinking to utilize prefect deployment to clone a feature branch to the dev server (execution environment) in each developers computer (dev environment) they’ll be triggering the deployment and execution. For the other environments, the deployment will be done by gitlab as a result, we only need prefect to execute. The deployment step using prefect will just be to use the local pull step to tell prefect where the script is located when the execution is triggered. Each of the environments will have a worker pool installed, since they’re each an execution environment. To perform the above process, I think I’ll need a pull step for each worker-pool to specify what I want it to do when the flow is executed in a worker-pool. For example, in dev, I’d like it to perform a git clone to a temp folder at a specific, working directory. For stg, qa, prod, I’d like it to look for the flow at this local folder path location in each of the environment. Is this setup currently possible?
👀 1
1
c

Christopher Boyd

07/11/2023, 8:00 PM
Thanks Ronald, let me review a bit and I’ll get back to you on this
r

Ronald Sam

07/11/2023, 8:00 PM
thank you!
Hi @Christopher Boyd, by any chance did you have time to review the above use case? I don't think we can go live with our flows until it addresses worker-pool specific pull steps.
c

Christopher Boyd

07/12/2023, 6:22 PM
Thinking out loud, I didn’t but going through it currently - 4 environments means at a minimum 4 workers, each listening to a specific work pool. You could have one flow tacked to multiple work pools like this; same flow different work pools:
Copy code
- name: hello_world_local
  entrypoint: ./Prefect/hello_world.py:hello_world
  work_pool:
    name: local_test
    work_queue_name: null
    job_variables: {}
- name: hello_world_k8s
  entrypoint: ./Prefect/hello_world.py:hello_world
  work_pool:
    name: kubernetes
    work_queue_name: null
    job_variables: {}
but regarding different pull steps, no - you could use a run_shell_script to do git pull behavior if that’s intended, but you can’t tie one pull step to one deployment, and another to another deployment
I think this is a good example repo:
r

Ronald Sam

07/12/2023, 6:24 PM
yeah I think what we really need is a different pull step depending on the environment we're deploying to
for dev we want to perform a git clone pull step but for all the other environments, since gitlab is already helping us with the deployment to the environment, we just want to use the prefect local pull step to specify where the file is located in the execution environment
c

Christopher Boyd

07/12/2023, 6:25 PM
that seems like an opportunity for CI/CD then - based on what branch / environment you’re in, you could configure the pull step / clone step you need
I do think maybe there is some opportunity to use a flag for a git pull in lieu of git clone, but that comes with its own challenges (like it would only be for local process types probably) because it wouldn’t make sense on k8s
r

Ronald Sam

07/12/2023, 6:26 PM
I think that's actually another issue in a previous thread I mentioned. I won't need a git pull anymore. Git clone works well
this is a different issue where we'd like to specify the deployment based on the execution environment
because depending on the environment deployment is already handled by gitlab or not
for dev deployment isn't handled by gitlab, so we want to have git clone as the pull step, while for the other environments, we just want to put prefect to where the flows are located in the execution environment
c

Christopher Boyd

07/12/2023, 6:36 PM
so I mean for those other deployments, there doesn’t need to be a pull step at all if it’s handled by gitlab
in that case, it’s just a local file path
do dev needs to do a git clone, but for prod, if gitlab handles the code existing there and you don’t need to git clone, then your pull step can just be null, and your path / entrypoint are all you need to specify
r

Ronald Sam

07/12/2023, 6:38 PM
Yeah, i think that might work
but doesn't all environment share the same prefect.yaml file?
t

Trevor Sweeney

09/18/2023, 8:45 PM
+1 for the ability to parameterize prefect.yaml pull git branch by execution environment
2 Views