https://prefect.io logo
Title
e

Emma Rizzi

04/17/2023, 1:08 PM
Hi, still testing the new projects feature! What's the pull step needed when building a docker image with
push:True
to a private registry ? Right now, my flow is built and push, the
flow.py
is inside the docker image, and I tried to configure the pull step with
prefect.projects.steps.set_working_directory
but I get
FileNotFoundError: [Errno 2] No such file or directory: '/opt/prefect/flow'
It seems that the pull step is executed outside of the docker container, so what should the pull step be for a project building a docker image containing all the files needed ?
1
a

alex

04/17/2023, 1:35 PM
Which image to use for a deployment is defined within the
job_variables
section of a
deployment.yaml
file. You can template in the output of the
build-docker_image
step to the
job_variables
section like so:
work_pool:
  name: my-work-pool
  job_variables:
    image: {{ image_name }}
e

Emma Rizzi

04/17/2023, 1:51 PM
Hi @alex, i added that parameter (with quotes, had a
found unhashable key
error without it) and now i get this on flow run :
Flow could not be retrieved from deployment.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 274, in retrieve_flow_then_begin_flow_run
    flow = await load_flow_from_flow_run(flow_run, client=client)
  File "/usr/local/lib/python3.10/site-packages/prefect/client/utilities.py", line 47, in with_injected_client
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect/deployments.py", line 169, in load_flow_from_flow_run
    basepath = deployment.path or Path(deployment.manifest_path).parent
  File "/usr/local/lib/python3.10/pathlib.py", line 960, in __new__
    self = cls._from_parts(args)
  File "/usr/local/lib/python3.10/pathlib.py", line 594, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/usr/local/lib/python3.10/pathlib.py", line 578, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
I didnt change any thing else from my files posted above, any idea ?
a

alex

04/17/2023, 2:01 PM
Do you have a pull step configured for your project?
e

Emma Rizzi

04/17/2023, 2:01 PM
Only this: (based on the dockerized tutorial)
pull:
- prefect.projects.steps.set_working_directory:
    directory: /opt/prefect/flow
a

alex

04/17/2023, 2:02 PM
OK, that’s good. What version of
prefect
are you running?
e

Emma Rizzi

04/17/2023, 2:04 PM
My CLI is on 2.10.4, I'm not sure about whats inside my base docker image but perhaps 2.9, i dont recall if i rebuilt it after 2.10 release, I'll test this!
👍 1
still the same error after changing my dockerfile to :
FROM prefecthq/prefect:2-python3.10


RUN python -m pip install --upgrade pip
RUN python -m pip install --upgrade prefect 

COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY ./* /opt/prefect/flow/
i confirm from build logs that everything is in 2.10.4 right now
oh found it!
in deployment.yaml i had path:null i changed to
entrypoint: flow.py:check
path: /opt/prefect/flow/
and it worked perhaps the workdir parameter didn't worked as I expected 🤔
a

alex

04/17/2023, 2:29 PM
I’m glad that it’s working, but I don’t think you should need to define the
path
for a deployment if you’re using a project. My hunch is that your
pull
steps aren’t getting saved correctly for some reason. Unfortunately, that’s tricky to verify. Let me see if I can write a script that would check that.
OK, if you create a file called
deployment_check.py
with this script as the contents:
import asyncio
from sys import argv

from prefect import get_client


async def print_deployment_pull_steps(deployment_name: str):
    async with get_client() as client:
        deployment = await client.read_deployment_by_name(name=deployment_name)
        print(deployment.pull_steps)


if __name__ == "__main__":
    deployment_name = argv[1]
    asyncio.run(print_deployment_pull_steps(deployment_name))
You can print your deployment pull steps by running
python check_deployment.py <deployment_name>
. What’s printed should match what’s in your deployment.yaml file.
e

Emma Rizzi

04/17/2023, 2:45 PM
i got an interesting error :
Traceback (most recent call last):
  File "/mnt/c/Users/Emma/miniconda3/lib/python3.9/site-packages/prefect/client/orchestration.py", line 1504, in read_deployment_by_name
    response = await self._client.get(f"/deployments/name/{name}")
  File "/mnt/c/Users/Emma/miniconda3/lib/python3.9/site-packages/httpx/_client.py", line 1754, in get
    return await self.request(
  File "/mnt/c/Users/Emma/miniconda3/lib/python3.9/site-packages/httpx/_client.py", line 1530, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/mnt/c/Users/Emma/miniconda3/lib/python3.9/site-packages/prefect/client/base.py", line 275, in send
    response.raise_for_status()
  File "/mnt/c/Users/Emma/miniconda3/lib/python3.9/site-packages/prefect/client/base.py", line 135, in raise_for_status
    raise PrefectHTTPStatusError.from_httpx_error(exc) from exc.__cause__
prefect.exceptions.PrefectHTTPStatusError: Client error '404 Not Found' for url '<https://api.prefect.cloud/api/accounts/e4cfe598-386a-4f3a-9909-9bc15f3edc57/workspaces/b1963c04-a25f-484d-b8b0-8b94ba364d49/deployments/name/hello-world-deploy>'
Response: {'detail': 'Not Found'}
For more information check: <https://httpstatuses.com/404>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/c/Users/Emma/Documents/murmuration/administration/orion/flow-docker/deployment_check.py", line 15, in <module>
    asyncio.run(print_deployment_pull_steps(deployment_name))
  File "/mnt/c/Users/Emma/miniconda3/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/mnt/c/Users/Emma/miniconda3/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/mnt/c/Users/Emma/Documents/orion/flow-docker/deployment_check.py", line 9, in print_deployment_pull_steps
    deployment = await client.read_deployment_by_name(name=deployment_name)
  File "/mnt/c/Users/Emma/miniconda3/lib/python3.9/site-packages/prefect/client/orchestration.py", line 1507, in read_deployment_by_name
    raise prefect.exceptions.ObjectNotFound(http_exc=e) from e
prefect.exceptions.ObjectNotFound
the url of my deployment when accessed by UI for comparison
<https://app.prefect.cloud/account/e4cfe598-386a-4f3a-9909-9bc15f3edc57/workspace/b1963c04-a25f-484d-b8b0-8b94ba364d49/deployments/deployment/897a25f8-ce10-48f4-8705-e22db2f09f53>
a

alex

04/17/2023, 2:48 PM
Ah, you’ll need to provide the name in the format
flow_name/deployment_name
. You can copy the name from the output of
prefect deployments ls
e

Emma Rizzi

04/17/2023, 2:49 PM
ah yes makes sense! i got this output :
python deployment_check.py check/hello-world-deploy
[{'prefect.projects.steps.set_working_directory': {'directory': '/opt/prefect/flow'}}]
a

alex

04/17/2023, 2:51 PM
It is saving the pull steps correctly then. It’s perplexing that you would need to set the
path
then. What work pool type are you using?
e

Emma Rizzi

04/17/2023, 2:53 PM
its a kubernetes work pool with custom job template to add volumes and access files outside the pods (in a NFS), if relevant :
{
  "variables": {
    "type": "object",
    "properties": {
      "env": {
        "type": "object",
        "title": "Environment Variables",
        "description": "Environment variables to set when starting a flow run.",
        "additionalProperties": {
          "type": "string"
        }
      },
      "name": {
        "type": "string",
        "title": "Name",
        "description": "Name given to infrastructure created by a worker."
      },
      "image": {
        "type": "string",
        "title": "Image",
        "example": "<http://docker.io/prefecthq/prefect:2-latest|docker.io/prefecthq/prefect:2-latest>",
        "description": "The image reference of a container image to use for created jobs. If not set, the latest Prefect image will be used."
      },
      "labels": {
        "type": "object",
        "title": "Labels",
        "description": "Labels applied to infrastructure created by a worker.",
        "additionalProperties": {
          "type": "string"
        }
      },
      "command": {
        "type": "string",
        "title": "Command",
        "description": "The command to use when starting a flow run. In most cases, this should be left blank and the command will be automatically generated by the worker."
      },
      "namespace": {
        "type": "string",
        "title": "Namespace",
        "default": "prefect",
        "description": "The Kubernetes namespace to create jobs within."
      },
      "stream_output": {
        "type": "boolean",
        "title": "Stream Output",
        "default": true,
        "description": "If set, output will be streamed from the job to local standard output."
      },
      "cluster_config": {
        "allOf": [
          {
            "$ref": "#/definitions/KubernetesClusterConfig"
          }
        ],
        "title": "Cluster Config",
        "description": "The Kubernetes cluster config to use for job creation."
      },
      "finished_job_ttl": {
        "type": "integer",
        "title": "Finished Job TTL",
        "default": 3601,
        "description": "The number of seconds to retain jobs after completion. If set, finished jobs will be cleaned up by Kubernetes after the given delay. If not set, jobs will be retained indefinitely."
      },
      "image_pull_policy": {
        "enum": [
          "IfNotPresent",
          "Always",
          "Never"
        ],
        "type": "string",
        "title": "Image Pull Policy",
        "default": "IfNotPresent",
        "description": "The Kubernetes image pull policy to use for job containers."
      },
      "service_account_name": {
        "type": "string",
        "title": "Service Account Name",
        "description": "The Kubernetes service account to use for job creation."
      },
      "job_watch_timeout_seconds": {
        "type": "integer",
        "title": "Job Watch Timeout Seconds",
        "description": "Number of seconds to wait for each event emitted by a job before timing out. If not set, the worker will wait for each event indefinitely."
      },
      "pod_watch_timeout_seconds": {
        "type": "integer",
        "title": "Pod Watch Timeout Seconds",
        "default": 60,
        "description": "Number of seconds to watch for pod creation before timing out."
      }
    },
    "definitions": {
      "KubernetesClusterConfig": {
        "type": "object",
        "title": "KubernetesClusterConfig",
        "required": [
          "config",
          "context_name"
        ],
        "properties": {
          "config": {
            "type": "object",
            "title": "Config",
            "description": "The entire contents of a kubectl config file."
          },
          "context_name": {
            "type": "string",
            "title": "Context Name",
            "description": "The name of the kubectl context to use."
          }
        },
        "description": "Stores configuration for interaction with Kubernetes clusters.\n\nSee `from_file` for creation.",
        "secret_fields": [],
        "block_type_slug": "kubernetes-cluster-config",
        "block_schema_references": {}
      }
    },
    "description": "Default variables for the Kubernetes worker.\n\nThe schema for this class is used to populate the `variables` section of the default\nbase job template."
  },
  "job_configuration": {
    "env": "{{ env }}",
    "name": "{{ name }}",
    "labels": "{{ labels }}",
    "command": "{{ command }}",
    "namespace": "{{ namespace }}",
    "job_manifest": {
      "kind": "Job",
      "spec": {
        "template": {
          "spec": {
            "volumes": [
              {
                "name": "dev",
                "hostPath": {
                  "path": "/mnt/sti-storage/dev",
                  "type": "Directory"
                }
              },
              {
                "name": "eodata",
                "hostPath": {
                  "path": "/eodata",
                  "type": "Directory"
                }
              },
              {
                "name": "prod",
                "hostPath": {
                  "path": "/mnt/sti-storage/prod",
                  "type": "Directory"
                }
              },
              {
                "name": "workspace",
                "hostPath": {
                  "path": "/mnt/workspace",
                  "type": "Directory"
                }
              }
            ],
            "containers": [
              {
                "env": "{{ env }}",
                "args": "{{ command }}",
                "name": "prefect-job",
                "image": "{{ image }}",
                "volumeMounts": [
                  {
                    "name": "dev",
                    "mountPath": "/mnt/sti-storage/dev"
                  },
                  {
                    "name": "prod",
                    "mountPath": "/mnt/sti-storage/prod"
                  },
                  {
                    "name": "workspace",
                    "mountPath": "/mnt/workspace"
                  },
                  {
                    "name": "eodata",
                    "mountPath": "/eodata"
                  }
                ],
                "imagePullPolicy": "{{ image_pull_policy }}"
              }
            ],
            "completions": 1,
            "parallelism": 1,
            "restartPolicy": "Never",
            "serviceAccountName": "{{ service_account_name }}"
          }
        },
        "ttlSecondsAfterFinished": "{{ finished_job_ttl }}"
      },
      "metadata": {
        "labels": "{{ labels }}",
        "namespace": "{{ namespace }}",
        "generateName": "{{ name }}-"
      },
      "apiVersion": "batch/v1"
    },
    "stream_output": "{{ stream_output }}",
    "cluster_config": "{{ cluster_config }}",
    "job_watch_timeout_seconds": "{{ job_watch_timeout_seconds }}",
    "pod_watch_timeout_seconds": "{{ pod_watch_timeout_seconds }}"
  }
}
a

alex

04/17/2023, 2:57 PM
Neat! I see that your
ImagePullPolicy
is set to
IfNotPresent
. Is it possible that your cluster isn’t pulling the image that was built with prefect 2.10.4?
e

Emma Rizzi

04/17/2023, 2:58 PM
ah yes i didn't change the default here, I'll test it! and with fresh tags just in case
i confirm it worked! I love the new Prefect but it takes way more time to migrate than i expected, thanks for your patience ! 🙏
🙌 1