Hi everyone. I am currently debugging an issue whe...
# ask-community
t
Hi everyone. I am currently debugging an issue where the healthcheck is failing. We are using a self-hosted Prefect Server, and are using the Docker storage backend to deploy flows to the server. The issue we are facing is that the healthcheck is failing with the following error:
Copy code
ModuleNotFoundError: No module named 'pipelines'
The error seem quite clear, however there is no reference to any module called
pipelines
, but the directory that holds my flows has this name. Overall, my project structure looks like this:
Copy code
my_sdk
|- __init__.py
|- my_module.py
|- cli
   |- __init__.py
   |- __main__.py
pipelines
|- __init__.py
|- first_flow
   |- __init__.py
   |- flow.py
|- second_flow
   |- __init__.py
   |- flow.py
We have created a CLI-tool that handles deployment to our infrastructure, and works by dynamically importing the flow using
importlib
, then setting up the
Docker
storage before calling
flow.register
.
Copy code
# Import flow

sys.path.append(os.path.abspath(pipeline_path))
flow_module = importlib.import_module(os.path.join(pipeline_path, "flow").replace("/", "."))
flow: Flow = getattr(flow_module, "flow")
This worked really well until we started seeing the error above. Previously, we used to have a separate
deploy.py
-script for each flow that would build and register the flows. We would ocasionally see a similar error, saying that it could not find a module called
flow
. Simply copying the
flow.py
file to the built image by adding it to the
Docker
-storage init solved this:
Copy code
flow.storage = Docker(
# ....
  files={
    path.join(FLOW_DIR, "flow.py"): "/flow.py"
  }
)
The strange part is that this error was not deterministic, and would only happen for one or two of our many flows. Since we only had to add the lines above to the deployment-file, we considered the problem as "solved for now". While working on the CLI, we once again encountered the same problem. We eventually solved it the same way as before, but this time had to copy
flow.py
to
/pipelines/flow_name/flow.py
in order to make it work, but once again - only a few flows were affected, not all of them. When debugging this issue, we managed to reduce the flow to a state where the problem simply dissapeared. We isolated a function that caused the
ModuleNotFoundError
, however it was not clear what part of the function that caused the error. Finally reducing the function to a stub, the error still persisted. The function only consisted of a single line
pass
, but were still causing the error. Removing the call to the offending function magically solved the problem, even though it was just a stub. Our conclusion is that the
flow.py
-file somehow is pickled when the Docker-image is built, but only in certain situations. We do not know why. Post-ramble summary - We observed that the file containing our flow-definition would be pickled with the flow itself when building the docker-image, but only in certain situations. Does anyone know what might be causing this behavior? We are happy with having to copy
flow.py
to the final docker-image, but would like to understand what is going on here.
k
The Docker storage will pickle the Flow file by default. If you are already putting the file inside, you can use script based storage like this
Copy code
flow.storage = Docker(
    path="/location/in/image/my_flow.py",
    stored_as_script=True
)
You can also just supply the image directly to Docker storage if that is easier for you.
t
wow. Wish I noticed that earlier. Thanks anyway 🤡
k
I’m not 100% sure it will fix your problem though
t
I tried setting
path
and
stored_as_script
like you suggested, but unfortunately got the same error
Any clue why a single stubbed function can cause this behavior? Could it be a cloudpickle-thing?
k
Not sure if it is the only/best approach; however, I achieved something similar for modules by creating a custom base docker image for flows that includes installing a python package containing the module.
k
It might be. Kevin’s suggestion is good. You can add the
setup.py
to your directory and install it as a Python module in the container to make sure everything is importable.
k
For your storage setup, it might looking something close to this:
Copy code
container_registry = os.environ.get("PREFECT_CONTAINER_REGISTRY", "localhost:5000")
    container_repository = os.environ.get(
        "PREFECT_CONTAINER_REPOSITORY", "fb-prefect-sandbox"
    )
    container_tag = os.environ.get("PREFECT_CONTAINER_TAG", "latest")
    base_image = f"{container_registry}/{container_repository}:{container_tag}"

    flow.storage = Docker(registry_url=container_registry, base_image=base_image)
k
Maybe this will also help
t
Perfect, thanks! I will give this a try