<@ULVA73B9P> I have a prefect.yaml file set for de...
# ask-marvin
d
@Marvin I have a prefect.yaml file set for deployments. but I keep getting a
Copy code
prefect.deployments.steps.set_working_directory - FileNotFoundError
error. Show me what a proper .yaml would look like with working directory steps set
m
thought for 142.2 seconds
Sounds like your deployment is setting a working directory that doesn’t exist at runtime. The
set_working_directory
step does not create directories; it just `chdir`s into them. Here are two minimal, correct
prefect.yaml
examples for Prefect 3.x. Example A: Clone a repo first, then set the working directory using the git step’s output
Copy code
# prefect.yaml
deployments:
  - name: my-flow-prod
    entrypoint: flows/my_flow.py:my_flow
    work_pool:
      name: my-pool
    pull:
      - prefect.deployments.steps.pull.git_clone:
          id: repo
          repository: <https://github.com/acme/prefect-project.git>
          branch: main
          # access_token: "{{ prefect.variables.GH_TOKEN }}"  # if needed
      - prefect.deployments.steps.pull.set_working_directory:
          directory: "{{ repo.directory }}/flows"
    parameters:
      foo: "bar"
    tags: ["prod"]
Example B: Files are baked into the image; just set the working directory to a path that already exists
Copy code
# prefect.yaml
deployments:
  - name: my-flow-image
    entrypoint: flows/my_flow.py:my_flow
    work_pool:
      name: docker-pool
      job_variables:
        image: "acme/prefect-image:latest"
    pull:
      - prefect.deployments.steps.pull.set_working_directory:
          directory: "/opt/prefect/flows"
Notes that avoid FileNotFoundError (the usual culprits): - Order matters: run
git_clone
(or any fetch step) before
set_working_directory
. - Use the output of steps via an
id
. For
git_clone
, you can reference
{{ repo.directory }}
which is the checked-out path. - The directory must exist in the runtime environment (your worker’s container/VM), not just on your local machine. - If you need to create the directory, add a shell step before `set_working_directory`:
Copy code
- prefect.deployments.steps.utility.run_shell_script:
      script: "mkdir -p /opt/prefect/flows"
  - prefect.deployments.steps.pull.set_working_directory:
      directory: "/opt/prefect/flows"
If you paste your current
prefect.yaml
and the full error log line (including the path it’s trying to use), I’ll point out exactly where it’s going wrong. Docs for reference: - Deployments overview: Prefect Deployments - Steps API (signatures):
prefect.deployments.steps.pull.set_working_directory(directory: str)
and
git_clone(repository: str, ...)
d
@Marvin Here is my current
prefect.yaml
setup...
Copy code
# Welcome to your prefect.yaml file! You can use this file for storing and managing
# configuration for deploying your flows. We recommend committing this file to source
# control along with your flow code.

# Generic metadata about this project
name: foo
prefect-version: 3.4.11

# build section allows you to manage and build docker images
build:
- prefect_docker.deployments.steps.build_docker_image:
    id: build_image
    requires: prefect-docker>=0.3.1
    image_name: {ecr_repo}/{repo_name}
    tag: latest
    dockerfile: Dockerfile

# push section allows you to manage if and how this project is uploaded to remote locations
push:
- prefect_docker.deployments.steps.push_docker_image:
    requires: prefect-docker>=0.3.1
    image_name: '{{ build_image.image_name }}'
    tag: '{{ build_image.tag }}'

# pull section allows you to provide instructions for cloning this project in remote locations
pull:
  - prefect.deployments.steps.set_working_directory:
      directory: /opt/prefect/working

# the deployments section allows you to provide configuration for deploying flows
deployments:
- name: test
  version: null
  tags: []
  description: null
  schedule: {}
  flow_name: main
  entrypoint: test.py:main
  parameters: {}
  work_pool:
    name: ecs_worker
    work_queue_name: null
    job_variables: {
       image: "{{ build_image.image }}:{{ build_image.tag }}"
    }
  schedules: []
Note that I am using a custom Dockerfile, which looks like...
Copy code
FROM {private_repo}
WORKDIR /opt/prefect/working
COPY . .
RUN python -m pip install -r requirements.txt
Also note, that the python script/flow I want to pass in is called
test.py
, with the flow name being
main
.
m
🔧 Using:
explore_module_offerings
📊 Tools used: •
explore_module_offerings
(8x)
Error: No module named 'prefect_docker.deployments.steps'
thought for 68.9 seconds
Thanks — that’s enough to reproduce the issue. Two things jump out: 1) Your image reference is wrong in the deployment. You built and pushed using
build_image.image_name
+
build_image.tag
, but your ECS job uses
build_image.image
, which doesn’t exist. That means your task likely runs on a default image that does not have
/opt/prefect/working
, and
set_working_directory
blows up with FileNotFoundError. 2) You can keep the working directory step, but you don’t actually need it if your image already sets
WORKDIR /opt/prefect/working
. Here’s a corrected minimal
prefect.yaml
that keeps the step and fixes the image reference:
Copy code
name: foo
prefect-version: 3.4.11

build:
- prefect_docker.deployments.steps.build_docker_image:
    id: build_image
    requires: prefect-docker>=0.3.1
    image_name: "{{ ecr_repo }}/{{ repo_name }}"
    tag: "latest"
    dockerfile: Dockerfile

push:
- prefect_docker.deployments.steps.push_docker_image:
    requires: prefect-docker>=0.3.1
    image_name: "{{ build_image.image_name }}"
    tag: "{{ build_image.tag }}"

pull:
  - prefect.deployments.steps.pull.set_working_directory:
      directory: "/opt/prefect/working"

deployments:
- name: test
  flow_name: main
  entrypoint: test.py:main
  parameters: {}
  tags: []
  work_pool:
    name: ecs_worker
    job_variables:
      image: "{{ build_image.image_name }}:{{ build_image.tag }}"
  schedules: []
If you want to be extra sure the path exists before
chdir
, add a quick sanity check:
Copy code
pull:
  - prefect.deployments.steps.utility.run_shell_script:
      script: "test -d /opt/prefect/working && echo 'OK' || (echo 'Missing dir' && exit 1)"
  - prefect.deployments.steps.pull.set_working_directory:
      directory: "/opt/prefect/working"
Or, simplest of all, drop the pull step entirely and rely on your Dockerfile’s `WORKDIR`:
Copy code
pull: []
Why this fails now: - Wrong image is used at runtime → your custom directory doesn’t exist in that container →
set_working_directory
raises FileNotFoundError. Fixing the
image
reference to
{{ build_image.image_name }}:{{ build_image.tag }}
is the key. If this still errors, paste the log line that shows which path it tried to
chdir
into, and we’ll pinpoint the next culprit. Reference: Prefect Deployments overview — https://docs.prefect.io/latest/deploy/
d
@Marvin I can't use your shell_script print, it gives the following error `RuntimeError:
run_shell_script
failed with error code 2: test: extra argument ‘&&’`
m
thought for 51.0 seconds
Yep — that error means the step is not running your command through a shell, so
&&
is being passed as a literal argument to
test
. Two easy fixes: Option A: run through a shell explicitly
Copy code
pull:
  - prefect.deployments.steps.utility.run_shell_script:
      script: >
        sh -c 'if [ -d /opt/prefect/working ]; then echo OK; else echo "Missing dir"; exit 1; fi'
  - prefect.deployments.steps.pull.set_working_directory:
      directory: "/opt/prefect/working"
Option B: simplest check, no printing — just fail if missing
Copy code
pull:
  - prefect.deployments.steps.utility.run_shell_script:
      script: "test -d /opt/prefect/working"
  - prefect.deployments.steps.pull.set_working_directory:
      directory: "/opt/prefect/working"
Since your Dockerfile sets WORKDIR to
/opt/prefect/working
, you can also drop the working-directory step entirely. Just make sure your deployment uses the right image:
Copy code
work_pool:
  name: ecs_worker
  job_variables:
    image: "{{ build_image.image_name }}:{{ build_image.tag }}"
If anything still complains, paste the exact path it tries to chdir into and we’ll finish the autopsy.
d
@Marvin with Option A, I get the following output
Copy code
RuntimeError: `run_shell_script` failed with error code 1:
m
thought for 66.4 seconds
Right — error code 1 from Option A means the directory check failed. In other words: inside the running container,
/opt/prefect/working
doesn’t exist. Your container is politely reporting “no such place”. Before we go further, please confirm you fixed the image reference and redeployed: - You should be using
image: "{{ build_image.image_name }}:{{ build_image.tag }}"
in job_variables (not
build_image.image
). - Verify the ECS task definition actually uses that image tag. If you still want a pre-check, use one of these: Create the directory (then set it)
Copy code
pull:
  - prefect.deployments.steps.utility.run_shell_script:
      script: "/bin/sh -lc 'mkdir -p /opt/prefect/working'"
  - prefect.deployments.steps.pull.set_working_directory:
      directory: "/opt/prefect/working"
Or print some diagnostics to see what’s there
Copy code
pull:
  - prefect.deployments.steps.utility.run_shell_script:
      script: |
        /bin/sh -lc '
          echo "pwd:"; pwd
          echo "ls /opt/prefect:"; ls -la /opt/prefect || true
          echo "ls /opt/prefect/working:"; ls -la /opt/prefect/working || true
        '
  - prefect.deployments.steps.pull.set_working_directory:
      directory: "/opt/prefect/working"
Also, small but important: the correct step path in Prefect 3.x is
prefect.deployments.steps.pull.set_working_directory
(note the
.pull.
). Your earlier example was missing that segment. Given your Dockerfile already sets
WORKDIR /opt/prefect/working
, the simplest fix is to: - Remove the working-directory pull step entirely, and - Ensure the deployment runs your custom image Minimal corrected deployment section
Copy code
deployments:
- name: test
  flow_name: main
  entrypoint: test.py:main
  work_pool:
    name: ecs_worker
    job_variables:
      image: "{{ build_image.image_name }}:{{ build_image.tag }}"
If you’d like, paste the ECS task definition’s image string and I’ll sanity-check it. If the directory still doesn’t exist after that, we’ll know your custom image isn’t what’s running.
d
@Marvin The image reference matches. Which task-definition exactly should be using the image tag?
m
thought for 132.3 seconds
Short answer: the image tag you set in your deployment is applied to the ECS task that actually runs each flow run. The Prefect ECS worker builds a task definition (or a new revision) for that run and sets the container’s image to your tag. It’s not the worker’s own task, and it’s not any long‑lived ECS Service you might have. How to verify where it’s applied: - Prefect side - Inspect your deployment to confirm the job variables are set:
Copy code
prefect deployment inspect "main/test"
- Inspect the work pool to see the base job template (and whether it pins a specific task definition ARN that could override your image):
Copy code
prefect work-pool inspect "ecs_worker"
If the base job template or job variables include a fixed
taskDefinitionArn
, that task definition’s container image will be used regardless of your
image:
override. - AWS ECS side - Go to ECS > Clusters > your cluster > Tasks. Open the task created for your flow run (not the worker). On the task’s “Containers” tab, check the “Image” field. That should be
{{ build_image.image_name }}:{{ build_image.tag }}
(your ECR repo:tag). Click through to see the exact digest if needed. - Open the Task Definition revision that task used; its container definition should list your image tag/digest. That’s the “which task-definition exactly” in question. If that running task’s image isn’t your custom image, either: - The work pool’s base job template pins a
taskDefinitionArn
(overriding your image), or - The deployment isn’t picking up the updated job variables (re-run
prefect deploy
), or - The worker is pointing at a different pool/cluster than you think. Once the flow-run task actually uses your image,
/opt/prefect/working
should exist (since your Dockerfile sets WORKDIR and copies your code), and you can drop the
set_working_directory
step entirely.
d
@Marvin Now I'm getting the following error...
Copy code
prefect_aws.workers.ecs_worker.TaskFailedToStart: CannotPullContainerError: pull image manifest has been retried 1 time(s): failed to resolve ref {image}:{tag}: unexpected status from HEAD request to https://{image}/manifests/{tag}: 403 Forbidden
m
Error: No module named 'prefect_aws.workers'
thought for 49.5 seconds
ECS is trying to pull your image and getting a 403 from the registry. That’s almost always one of: - The image reference isn’t fully qualified (ECR domain missing or wrong region/account) - The image/tag doesn’t exist - The ECS task execution role lacks ECR pull permissions - You’re using a private non‑ECR registry without repository credentials Quick checklist 1) Use a fully qualified image name - For ECR it must look like:
Copy code
<http://123456789012.dkr.ecr.us-east-1.amazonaws.com/my-repo:latest|123456789012.dkr.ecr.us-east-1.amazonaws.com/my-repo:latest>
- In your prefect.yaml, set:
Copy code
build:
  - prefect_docker.deployments.steps.build_docker_image:
      id: build_image
      image_name: "<http://123456789012.dkr.ecr.us-east-1.amazonaws.com/my-repo|123456789012.dkr.ecr.us-east-1.amazonaws.com/my-repo>"
      tag: "latest"
  ...
  deployments:
  - name: test
    work_pool:
      name: ecs_worker
      job_variables:
        image: "{{ build_image.image_name }}:{{ build_image.tag }}"
- Make sure the value you template into
image:
matches exactly what you see in the ECR console. 2) Confirm the image exists - Check ECR > your repository > “latest” tag is present. If not, your push step didn’t push what you think. 3) Ensure the ECS task execution role has ECR permissions - The task that runs your flow (not the worker) must use an execution role with at least AmazonECSTaskExecutionRolePolicy attached. That includes: - ecr:GetAuthorizationToken - ecr:BatchCheckLayerAvailability - ecr:GetDownloadUrlForLayer - ecr:BatchGetImage - If you’re using a custom execution role via your work pool template, set/verify it there. - Docs: ECS task execution role 4) Cross‑account/region? - If the image is in a different AWS account, add a repository policy on the ECR repo that allows the ECS execution role principal to pull. - If it’s in a different region, the fully qualified domain must include that region; permissions still need to cover it. 5) Using a private non‑ECR registry? - You must provide repository credentials on the container definition (ECS
repositoryCredentials.credentialsParameter
pointing to a Secrets Manager ARN with your auth). - Add this in your ECS work pool’s base job template (UI) under the container definition. AWS docs: Private registry authentication Sanity checks - Inspect the deployment/job vars:
Copy code
prefect deployment inspect "main/test"
  prefect work-pool inspect "ecs_worker"
Ensure the
image
value is the full ECR URL and that your base job template isn’t pinning a
taskDefinitionArn
that overrides it. - In ECS, open the actual flow-run task → Containers → Image. It should be your ECR URL+tag. If not, the template is overriding it. If you share: - The exact
image:
string being set at runtime - The executionRoleArn used by the flow-run task I can pinpoint which item above is failing.
d
@Marvin Which AWS role exactly is pulling the image from ECR. The task role or task execution role?
m
thought for 32.8 seconds
The task execution role. ECS uses the execution role to pull container images and write logs; the task role is only used by your application code at runtime. What this means for your 403: - The executionRoleArn on the flow-run task definition must have ECR pull permissions. - The taskRoleArn is irrelevant for image pulls. Minimum execution role permissions
Copy code
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "*"
    }
  ]
}
Cross-account pulls - If your image is in a different AWS account, add an ECR repository policy that allows the execution role principal to pull. Private non‑ECR registries - ECS uses the execution role to fetch repository credentials from Secrets Manager (repositoryCredentials.credentialsParameter), so it needs
secretsmanager:GetSecretValue
(and
kms:Decrypt
if applicable). Verify you’re looking at the right role - In ECS, open the task for your flow run → Task role vs Task execution role. The latter is the one that must have the ECR permissions. - In Prefect, ensure your ECS work pool/job template sets the correct executionRoleArn for flow runs. Docs: - ECS task execution role - ECR repository policies - Private registry authentication