MarcoM
09/23/2025, 10:40 AMRuntimeError: Failed to clone repository '<https://github.com/prometeia-wam/da-data-pipeline-customer.git>' with exit code 128.
To work around this, I’ve been looking into adjusting my prefect.yaml.
Here’s the relevant part of the file:
# Generic metadata about this project
name: prefect-xxx
# pull section allows you to provide instructions for cloning this project in remote locations
pull:
- prefect.deployments.steps.git_clone:
    repository: <https://github.com/prometeia-wam/da-data-pipeline-customer.git>
    branch: "{{ prefect.variables.git_branch }}"
    access_token: "{{ prefect.blocks.secret.github-tech-user-token }}"
deployments:
- name: xxxx-upload
  version:
  tags: ['CUSTOMER','BPER']
  description: Upload outputs for BPER
  schedule:
      cron: "30 22 * * 1-5"
      timezone: "Europe/Rome"
      active: "{{ prefect.variables.is_schedule_active }}"
  entrypoint: flows/assets_uploader.py:assets_uploader
  parameters: {asset_ids: ["xxxx-gpp_benchmark_description"]}
  work_pool:
    name: customer-workpool
    work_queue_name: default
    job_variables: {}
What I’d like to do is extend the pull section: if the git_clone step fails, I’d like to run a prefect.deployments.steps.run_shell_script step that retries the clone.
Do you have any suggestions on the best way to set this up in prefect.yaml?Marvin
09/23/2025, 10:40 AMMarvin
09/23/2025, 10:43 AMgit_clone here: Add retries to pull step (git clone)
Given that, the most reliable approach is to replace the git_clone pull step with a run_shell_script step that performs the clone with retry/backoff itself. On Windows, run_shell_script uses cmd.exe by default, so here’s a drop-in example that keeps your templating and token block but adds retries:
pull:
  - prefect.deployments.steps.run_shell_script:
      # Keep secrets out of the script body; reference them via env
      env:
        GITHUB_TOKEN: "{{ prefect.blocks.secret.github-tech-user-token }}"
        GIT_BRANCH: "{{ prefect.variables.git_branch }}"
      expand_env_vars: true
      # Consider setting to false if you're worried about logs (Prefect will try to redact, but env is safest)
      stream_output: true
      script: |
        @echo off
        setlocal enabledelayedexpansion
        set REPO_OWNER=prometeia-wam
        set REPO_NAME=da-data-pipeline-customer
        set REPO_DIR=%REPO_NAME%
        set RETRIES=5
        set SLEEP=15
        echo Cloning %REPO_OWNER%/%REPO_NAME% branch %GIT_BRANCH% with up to %RETRIES% retries...
        for /L %%i in (1,1,%RETRIES%) do (
          if exist "%REPO_DIR%" rmdir /s /q "%REPO_DIR%"
          rem Use a token-authenticated URL; avoid printing the URL to logs
          git clone --depth 1 --branch "%GIT_BRANCH%" "<https://%GITHUB_TOKEN>%@github.com/%REPO_OWNER%/%REPO_NAME%.git" "%REPO_DIR%"
          if !errorlevel! EQU 0 goto :success
          echo Clone failed (attempt %%i/%RETRIES%). Retrying in %SLEEP% seconds...
          timeout /t %SLEEP% /nobreak >nul
        )
        echo All retries failed.
        exit /b 1
        :success
        echo Clone succeeded.
Then remove the original prefect.deployments.steps.git_clone entry. Everything else in your prefect.yaml can stay the same.
Notes and options:
- Security: Passing the token via env + expand_env_vars: true keeps it out of the script body. You can also set stream_output: false if you want to be extra cautious about logging.
- Performance: Using --depth 1 reduces data transfer and can make retries succeed faster.
- PowerShell alternative: If you prefer PowerShell, I can share a PowerShell retry loop; we’d invoke it via run_shell_script (either as a PowerShell one-liner or by calling powershell -Command).
- Docs for pull steps and run_shell_script: Pull steps docs and a reference to run_shell_script under steps: Deployment guide
If you’d rather keep using the built-in git_clone, there isn’t a way to “fallback” on failure today. The alternative pattern to avoid runtime git entirely is to package code with your runtime (e.g., ship a Docker image or pull from object storage), which sidesteps intermittent git network issues — happy to outline that if helpful.
Would you like the PowerShell version of the script, or are you good with the cmd-style loop above?MarcoM
09/23/2025, 11:06 AMrun_shell_script failed with error code 1: %%i was unexpected at this time.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "C:\PREFECT_GITHUB\venv\Lib\site-packages\prefect\engine.py", line 420, in retrieve_flow_then_begin_flow_run
    else await load_flow_from_flow_run(flow_run, client=client)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PREFECT_GITHUB\venv\Lib\site-packages\prefect\client\utilities.py", line 78, in with_injected_client
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PREFECT_GITHUB\venv\Lib\site-packages\prefect\deployments\deployments.py", line 301, in load_flow_from_flow_run
    output = await run_steps(deployment.pull_steps)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PREFECT_GITHUB\venv\Lib\site-packages\prefect\deployments\steps\core.py", line 182, in run_steps
    raise StepExecutionError(f"Encountered error while running {fqn}") from exc
prefect.deployments.steps.core.StepExecutionError: Encountered error while running prefect.deployments.steps.run_shell_scriptMarvin
09/23/2025, 11:07 AMMarvin
09/23/2025, 11:08 AMcmd.exe, quando esegui uno script “inline” (come fa run_shell_script), la variabile del for deve usare una sola percentuale (%i), non %%i (le doppie percentuali servono solo dentro file .bat).
2) expand_env_vars: true fa espandere prima del run i placeholder %VAR% con l’ambiente del processo Python (non con env: che imposti tu). Questo può trasformare (%RETRIES%) in vuoto e rompere il for. Meglio disattivarlo.
3) La stringa dell’URL è stata “contaminata” dal front-end/Slack: vedo https://%<mailto:GITHUB_TOKEN%@github.com|GITHUB_TOKEN%@github.com>/... — va sostituita con <https://%GITHUB_TOKEN>%@github.com/....
Prova questa configurazione corretta (CMD), con retry/backoff e senza espansione preventiva:
pull:
  - prefect.deployments.steps.run_shell_script:
      env:
        GIT_BRANCH: "{{ prefect.variables.git_branch }}"
        GITHUB_TOKEN: "{{ prefect.blocks.secret.github-tech-user-token }}"
      expand_env_vars: false
      stream_output: true
      script: |
        @echo off
        setlocal EnableDelayedExpansion
        set REPO_OWNER=prometeia-wam
        set REPO_NAME=da-data-pipeline-customer
        set REPO_DIR=%REPO_NAME%
        set RETRIES=5
        set SLEEP=15
        echo Cloning %REPO_OWNER%/%REPO_NAME% branch %GIT_BRANCH% with up to %RETRIES% retries...
        for /L %i in (1,1,%RETRIES%) do (
          if exist "%REPO_DIR%" rmdir /s /q "%REPO_DIR%"
          rem Token nel campo utente dell'URL (attenzione ai log)
          git clone --depth 1 --branch "%GIT_BRANCH%" "<https://%GITHUB_TOKEN>%@github.com/%REPO_OWNER%/%REPO_NAME%.git" "%REPO_DIR%"
          if !errorlevel! EQU 0 goto :success
          echo Clone failed (attempt %i/%RETRIES%). Retrying in %SLEEP% seconds...
          timeout /t %SLEEP% /nobreak >nul
        )
        echo All retries failed.
        exit /b 1
        :success
        echo Clone succeeded.
Note importanti:
- Usa %i (non %%i) nel for.
- expand_env_vars: false per evitare che %RETRIES%, %REPO_NAME%, etc. vengano espansi prima che set li definisca.
- Correggi l’URL del clone per rimuovere il frammento mailto che vedi nel front end.
- Se vuoi minimizzare il rischio che il token compaia nei log, metti stream_output: false. In alternativa, puoi evitare il token nell’URL e usare un header (attenzione: il token sarà comunque visibile nella command line del processo):
  git -c http.extraHeader="Authorization: Bearer %GITHUB_TOKEN%" clone --depth 1 --branch "%GIT_BRANCH%" "<https://github.com/%REPO_OWNER%/%REPO_NAME%.git>" "%REPO_DIR%"
Se preferisci PowerShell (meno insidie con le percentuali), posso fornirti anche la versione PS con retry. Vuoi che te la condivida? Inoltre, qui trovi la feature request per aggiungere retry al pull step git_clone nativo: Add retries to pull step (git clone).MarcoM
09/23/2025, 12:02 PMrun_shell_script failed with error code 128: Cloning into '%REPO_DIR%'...
fatal: unable to access 'https://github.com/%REPO_OWNER%/%REPO_NAME%.git/': The requested URL returned error: 400
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "C:\PREFECT_GITHUB\venv\Lib\site-packages\prefect\engine.py", line 420, in retrieve_flow_then_begin_flow_run
    else await load_flow_from_flow_run(flow_run, client=client)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PREFECT_GITHUB\venv\Lib\site-packages\prefect\client\utilities.py", line 78, in with_injected_client
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PREFECT_GITHUB\venv\Lib\site-packages\prefect\deployments\deployments.py", line 301, in load_flow_from_flow_run
    output = await run_steps(deployment.pull_steps)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PREFECT_GITHUB\venv\Lib\site-packages\prefect\deployments\steps\core.py", line 182, in run_steps
    raise StepExecutionError(f"Encountered error while running {fqn}") from exc
prefect.deployments.steps.core.StepExecutionError: Encountered error while running prefect.deployments.steps.run_shell_script  --> questa la configurazione che vedo a front end -->
[
  {
    "prefect.deployments.steps.run_shell_script": {
      "env": {
        "GIT_BRANCH": "{{ prefect.variables.git_branch }}",
        "GITHUB_TOKEN": "{{ prefect.blocks.secret.github-tech-user-token }}"
      },
      "script": "@echo off\nsetlocal EnableDelayedExpansion\n\nset REPO_OWNER=prometeia-wam\nset REPO_NAME=da-data-pipeline-customer\nset REPO_DIR=%REPO_NAME%\nset RETRIES=5\nset SLEEP=15\n\necho Cloning %REPO_OWNER%/%REPO_NAME% branch %GIT_BRANCH% with up to %RETRIES% retries...\n\nfor /L %i in (1,1,%RETRIES%) do (\n  if exist \"%REPO_DIR%\" rmdir /s /q \"%REPO_DIR%\"\n\n  rem Token nel campo utente dell'URL (attenzione ai log)\n  git clone --depth 1 --branch \"%GIT_BRANCH%\" \"<https://%GITHUB_TOKEN>%@github.com/%REPO_OWNER%/%REPO_NAME%.git\" \"%REPO_DIR%\"\n  if !errorlevel! EQU 0 goto :success\n\n  echo Clone failed (attempt %i/%RETRIES%). Retrying in %SLEEP% seconds...\n  timeout /t %SLEEP% /nobreak >nul\n)\n\necho All retries failed.\nexit /b 1\n\n:success\necho Clone succeeded.\n",
      "stream_output": true,
      "expand_env_vars": false
    }
  }
]Marvin
09/23/2025, 12:02 PMMarvin
09/23/2025, 12:04 PMrun_shell_script esegue il contenuto con PowerShell. In PowerShell le variabili non sono nel formato %VAR%, quindi %REPO_DIR%, %REPO_OWNER%, ecc. non vengono espanse e finiscono nella URL letteralmente.
- Inoltre, nel front-end vedo ancora i chevron “< >” attorno alla URL ("<https://%GITHUB_TOKEN>%@github.com/..."): quelli vengono dal rendering e rompono la URL.
Soluzione più semplice: usa una versione PowerShell del retry e passa il token tramite header HTTP (così eviti anche la URL con credenziali). Ecco una configurazione pronta all’uso:
pull:
  - prefect.deployments.steps.run_shell_script:
      env:
        GIT_BRANCH: "{{ prefect.variables.git_branch }}"
        GITHUB_TOKEN: "{{ prefect.blocks.secret.github-tech-user-token }}"
      expand_env_vars: false
      stream_output: true
      script: |
        $ErrorActionPreference = 'Stop'
        $REPO_OWNER = 'prometeia-wam'
        $REPO_NAME  = 'da-data-pipeline-customer'
        $REPO_DIR   = $REPO_NAME
        $RETRIES    = 5
        $SLEEP      = 15
        Write-Host "Cloning $REPO_OWNER/$REPO_NAME branch $env:GIT_BRANCH with up to $RETRIES retries..."
        for ($i = 1; $i -le $RETRIES; $i++) {
          if (Test-Path $REPO_DIR) { Remove-Item -Recurse -Force $REPO_DIR }
          # Autenticazione via header: più sicura e senza problemi di quoting della URL
          git -c http.extraHeader="Authorization: Bearer $env:GITHUB_TOKEN" clone --depth 1 --branch "$env:GIT_BRANCH" "<https://github.com/$REPO_OWNER/$REPO_NAME.git>" "$REPO_DIR"
          if ($LASTEXITCODE -eq 0) {
            Write-Host "Clone succeeded."
            exit 0
          }
          Write-Host "Clone failed (attempt $i/$RETRIES). Retrying in $SLEEP seconds..."
          Start-Sleep -Seconds $SLEEP
        }
        Write-Error "All retries failed."
        exit 1
Note importanti:
- Niente “< >” attorno alle URL nel script. Devono essere stringhe normali.
- Mantieni `expand_env_vars: false`: in questo modo la sostituzione di $env:GIT_BRANCH e $env:GITHUB_TOKEN avviene in PowerShell (non prima).
- Se vuoi minimizzare la possibilità che il token finisca nei log, puoi anche impostare stream_output: false.
Alternativa: forzare CMD. Se preferisci mantenere la versione CMD, allora richiama esplicitamente CMD da PowerShell, ma è più fragile. La versione PowerShell sopra è la più pulita su Windows.
Riferimento alla richiesta di feature per retry nativi del pull step `git_clone`: Add retries to pull step (git clone)
Prova questa configurazione e dimmi se vedi ancora errori; in caso, condividi l’output “Clone failed …” della prima iterazione (senza token) così capiamo se è un problema di rete/proxy o di permessi del token su quel repo.MarcoM
09/23/2025, 12:09 PMrun_shell_script failed with error code 1: '$ErrorActionPreference' is not recognized as an internal or external command,
operable program or batch file.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "C:\PREFECT_GITHUB\venv\Lib\site-packages\prefect\engine.py", line 420, in retrieve_flow_then_begin_flow_run
    else await load_flow_from_flow_run(flow_run, client=client)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PREFECT_GITHUB\venv\Lib\site-packages\prefect\client\utilities.py", line 78, in with_injected_client
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PREFECT_GITHUB\venv\Lib\site-packages\prefect\deployments\deployments.py", line 301, in load_flow_from_flow_run
    output = await run_steps(deployment.pull_steps)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PREFECT_GITHUB\venv\Lib\site-packages\prefect\deployments\steps\core.py", line 182, in run_steps
    raise StepExecutionError(f"Encountered error while running {fqn}") from exc
prefect.deployments.steps.core.StepExecutionError: Encountered error while running prefect.deployments.steps.run_shell_scriptMarvin
09/23/2025, 12:09 PMMarvin
09/23/2025, 12:10 PMrun_shell_script usa CMD per default. Perciò:
- Se vuoi usare PowerShell, devi invocarlo esplicitamente da CMD.
- In alternativa, resta su CMD puro (più semplice) ma correggi URL/variabili.
Ti propongo la soluzione più semplice: versione CMD con retry e autenticazione via header (evita di mettere il token nella URL e previene i 400 dovuti a encoding/quoting).
Sostituisci il tuo pull con questo (attenzione: niente “< >” attorno alle URL nel script):
pull:
  - prefect.deployments.steps.run_shell_script:
      env:
        GIT_BRANCH: "{{ prefect.variables.git_branch }}"
        GITHUB_TOKEN: "{{ prefect.blocks.secret.github-tech-user-token }}"
      expand_env_vars: false
      stream_output: true
      script: |
        @echo off
        setlocal EnableDelayedExpansion
        set REPO_OWNER=prometeia-wam
        set REPO_NAME=da-data-pipeline-customer
        set REPO_DIR=%REPO_NAME%
        set RETRIES=5
        set SLEEP=15
        echo Cloning %REPO_OWNER%/%REPO_NAME% branch %GIT_BRANCH% with up to %RETRIES% retries...
        for /L %i in (1,1,%RETRIES%) do (
          if exist "%REPO_DIR%" rmdir /s /q "%REPO_DIR%"
          rem Autenticazione via header (niente token nella URL)
          git -c http.extraHeader="Authorization: Bearer %GITHUB_TOKEN%" clone --depth 1 --branch "%GIT_BRANCH%" "<https://github.com/%REPO_OWNER%/%REPO_NAME%.git>" "%REPO_DIR%"
          if !errorlevel! EQU 0 goto :success
          echo Clone failed (attempt %i/%RETRIES%). Retrying in %SLEEP% seconds...
          timeout /t %SLEEP% /nobreak >nul
        )
        echo All retries failed.
        exit /b 1
        :success
        echo Clone succeeded.
Punti chiave:
- Usa %i (non %%i) nel for.
- Mantieni expand_env_vars: false (le variabili %...% vengono espanse da CMD al momento giusto).
- Evita assolutamente i chevron <> nelle stringhe del `script`: quelli vengono da editor/rendering e rompono la URL.
- Il token è passato via env e viene usato nel header HTTP (più sicuro e robusto del token nella URL).
- Verifica che git sia nel PATH del worker.
Se invece preferisci PowerShell, devi invocarlo esplicitamente da CMD. Ecco una variante che crea un file .ps1 temporaneo e lo esegue:
pull:
  - prefect.deployments.steps.run_shell_script:
      env:
        GIT_BRANCH: "{{ prefect.variables.git_branch }}"
        GITHUB_TOKEN: "{{ prefect.blocks.secret.github-tech-user-token }}"
      expand_env_vars: false
      stream_output: true
      script: |
        @echo off
        setlocal
        set PS1=%TEMP%\prefect_clone.ps1
        >"%PS1%" (
          echo $ErrorActionPreference = 'Stop'
          echo $REPO_OWNER = 'prometeia-wam'
          echo $REPO_NAME  = 'da-data-pipeline-customer'
          echo $REPO_DIR   = $REPO_NAME
          echo $RETRIES    = 5
          echo $SLEEP      = 15
          echo Write-Host "Cloning $REPO_OWNER/$REPO_NAME branch $env:GIT_BRANCH with up to $RETRIES retries..."
          echo for ($i = 1; $i -le $RETRIES; $i++) {
          echo   if (Test-Path $REPO_DIR) { Remove-Item -Recurse -Force $REPO_DIR }
          echo   git -c http.extraHeader^="Authorization: Bearer $env:GITHUB_TOKEN" clone --depth 1 --branch "$env:GIT_BRANCH" "<https://github.com/$REPO_OWNER/$REPO_NAME.git>" "$REPO_DIR"
          echo   if ($LASTEXITCODE -eq 0) { Write-Host "Clone succeeded."; exit 0 }
          echo   Write-Host "Clone failed (attempt $i/$RETRIES). Retrying in $SLEEP seconds..."
          echo   Start-Sleep -Seconds $SLEEP
          echo }
          echo Write-Error "All retries failed."
          echo exit 1
        )
        powershell -NoProfile -NonInteractive -ExecutionPolicy Bypass -File "%PS1%"
        set EXITCODE=%ERRORLEVEL%
        del "%PS1%" 2>nul
        exit /b %EXITCODE%Marvin
09/23/2025, 12:10 PM