Hello, Something pretty weird is happening with C...
# prefect-cloud
l
Hello, Something pretty weird is happening with Cloud Run Push work pools. This is 7 consecutives flow run executions of the same deployment without changing anything. As you can see, some runs fail and some not and when it fails the error is the following:
Copy code
Flow could not be retrieved from deployment.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/engine.py", line 310, in retrieve_flow_then_begin_flow_run
    flow = await load_flow_from_flow_run(flow_run, client=client)
  File "/usr/local/lib/python3.8/site-packages/prefect/client/utilities.py", line 40, in with_injected_client
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/prefect/deployments.py", line 204, in load_flow_from_flow_run
    output.update(await run_step(step))
  File "/usr/local/lib/python3.8/site-packages/prefect/projects/steps/core.py", line 77, in run_step
    step_func = _get_function_for_step(fqn, requires=keywords.get("requires"))
  File "/usr/local/lib/python3.8/site-packages/prefect/projects/steps/core.py", line 29, in _get_function_for_step
    step_func = import_object(fully_qualified_name)
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/importtools.py", line 212, in import_object
    module = load_module(module_name)
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/importtools.py", line 183, in load_module
    return importlib.import_module(module_name)
  File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 970, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'prefect.deployments.steps'; 'prefect.deployments' is not a package
What is happening here?
I'm deploying the flow from a
prefect.yaml
: prefect deploy --name "datalake-export-run-exports"
Copy code
name: export-datalake
prefect-version: 2.20.3

# build section allows you to manage and build docker images
build:
- prefect_docker.deployments.steps.build_docker_image:
    id: build_image
    requires: prefect-docker>=0.5.4
    image_name: europe-docker.pkg.dev/project-id/image-name
    tag: latest
    dockerfile: ../../../docker/Dockerfile.extended

# push section allows you to manage if and how this project is uploaded to remote locations
push:
- prefect_docker.deployments.steps.push_docker_image:
    requires: prefect-docker>=0.5.4
    image_name: '{{ build_image.image_name }}'
    tag: '{{ build_image.tag }}'

pull:
- prefect.deployments.steps.set_working_directory:
    directory: /opt/prefect

definitions:
  work_pools:
    gcp_push: &gcp_push
      name: gcp-cloud-run-push
      work_queue_name:
      job_variables:
        image: '{{ build_image.image }}'

deployments:
- name: datalake-export-run-exports
  entrypoint: flow.py:run_export_jobs
  work_pool: *gcp_push
  version:
  tags: []
  description: Runs a job export for every export configured
  parameters: {}
  schedules:
  - cron: 0 7 * * *
    timezone: UTC
    day_or: true
    active: true
The code is baked in the docker image and the dockerfile pretty much installs a requirements.txt file and copies the flow code
Copy code
FROM europe-docker.pkg.dev/data-genially/prefect/prefect-base:2.20.3-utils0.2.3-python3.12 # this is a prefect image with some utilities installed

COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy python files
COPY . /opt/prefect
As you can see, I'm running prefect 2.20.3 on python 3.12. The thing is that the stack trace shows that is running python 3.8 for some reason. :S
I don't have any issues locally running the job in a docker work pool.
n
can you run some
prefect version
command inside the container?
europe-docker.pkg.dev/data-genially/prefect/prefect-base:2.20.3-utils0.2.3-python3.12
im not sure what you ahve in there, but this very much seems like a version issue to me
Copy code
ModuleNotFoundError: No module named 'prefect.deployments.steps'; 'prefect.deployments' is not a package
unless somehow there's some namespace overlap thing
l
Running
Copy code
docker run -it europe-docker.pkg.dev/project-id/image-name:latest bash
Copy code
root@23033f72e8fb:/opt/prefect# prefect version
Version:             2.20.3
API version:         0.8.4
Python version:      3.12.5
Git commit:          b8c27aa0
Built:               Thu, Aug 22, 2024 3:13 PM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         ephemeral
Server:
  Database:          sqlite
  SQLite version:    3.40.1
im not sure what you ahve in there, but this very much seems like a version issue to me
It happens sometimes and sometimes not without redeploying or rebuilding the image, that's the weird part.
>> europe-docker.pkg.dev/data-genially/prefect/prefect-base:2.20.3-utils0.2.3-python3.12 If I print the same thing in this image it returns the same output as above
n
File "/usr/local/lib/python3.8/site-packages/prefect/projects/steps/core.py", line 77, in run_step
step_func = _get_function_for_step(fqn, requires=keywords.get("requires"))
File "/usr/local/lib/python3.8/site-packages/prefect/projects/steps/core.py", line 29, in _get_function_for_step
hmm interesting so looking closer actually this has to be a version thing
Copy code
prefect/projects
does not exist in prefect 2.20.3
so there is no possible way that the code you're running there is actually 2.20.3 (at least when it fails)
i would be suspicious of however this image gets built
Copy code
europe-docker.pkg.dev/data-genially/prefect/prefect-base:2.20.3-utils0.2.3-python3.12
or maybe the requirements are overwriting / downgrading the prefect installation?
l
i would be suspicious of however this image gets built
That image is built as follows:
Copy code
ARG PREFECT_VERSION
ARG PYTHON_VERSION

FROM prefecthq/prefect:${PREFECT_VERSION}-python${PYTHON_VERSION}

# Install dependencies to be able to install private packages
RUN  apt-get -yq update && apt-get -yqq install openssh-client
RUN mkdir -p -m 0700 ~/.ssh && ssh-keyscan <http://github.com|github.com> >> ~/.ssh/known_hosts
RUN pip install --upgrade pip

# Install dependencies
RUN --mount=type=ssh,id=githubkey pip install <git+ssh://git@github.com/some-private-repo.git@v0.2.3>
And the prefect version / python version is what the name of the image has.
n
one thing that may be helpful in debugging is to add a
pull
step action that uses
run_shell_script
to echo
pip freeze | grep prefect
or something
so at least when it fails you could see what you really had
errrrr actually lol the issue was running the deployment steps in the first place, nevermind 🙂
l
I don't know how the push work pools work under the hood but if prefect uses their own prefect workers to submit the work to cloud run, could it be that exists some worker with older versions of prefect sending the work to cloud run? and that is why it fails sometimes?
Another thing that does not add up is that the image does not have python 3.8 installed in the path that it says in the error:
Copy code
docker run -it europe-docker.pkg.dev/project-id/image-name:latest bash

root@0573ea7b877f:/opt/prefect# cd /usr/local/lib/
root@0573ea7b877f:/usr/local/lib# ls
libpython3.12.so  libpython3.12.so.1.0  libpython3.so  pkgconfig  python3.12
I gotta go, but in case you need more info: • Workspace ID : 2a6bc78f-9c0f-4ac8-8399-2e571dcc5c05 • Deployment ID: 47db1531-ac1c-438e-b298-7b022bd274ce
n
thank you! im not sure yet, but ill ask the domain expert on this and let you know if I figure anything out
l
Alright, really appreciate it. Thank you very much!
k
@Luis Cebrián do you by any chance also have an agent type work pool with an agent running?
l
Yes, but it is not polling that workpool
Alright @Kevin Grismore was on the right track. Mystery solved: • I had an old agent polling work from the "default" queue. • When I created the cloud run push work pool, another "default" queue was created. • The agent was in fact picking up work from the cloud run push work pool and because it had prefect 2.10 installed, it did not recognize the new way of deploying flows (through
prefect.yaml
). • Solved it by telling the agent to pickup work from the "default-agent-pool" work pool. Thanks @Nate and @Kevin Grismore for your help. I really appreciate it.
catjam 1
🙌 1