Thread
#prefect-community
    Thomas Fredriksen

    Thomas Fredriksen

    8 months ago
    Hi everyone. I am currently debugging an issue where the healthcheck is failing. We are using a self-hosted Prefect Server, and are using the Docker storage backend to deploy flows to the server. The issue we are facing is that the healthcheck is failing with the following error:
    ModuleNotFoundError: No module named 'pipelines'
    The error seem quite clear, however there is no reference to any module called
    pipelines
    , but the directory that holds my flows has this name. Overall, my project structure looks like this:
    my_sdk
    |- __init__.py
    |- my_module.py
    |- cli
       |- __init__.py
       |- __main__.py
    pipelines
    |- __init__.py
    |- first_flow
       |- __init__.py
       |- flow.py
    |- second_flow
       |- __init__.py
       |- flow.py
    We have created a CLI-tool that handles deployment to our infrastructure, and works by dynamically importing the flow using
    importlib
    , then setting up the
    Docker
    storage before calling
    flow.register
    .
    # Import flow
    
    sys.path.append(os.path.abspath(pipeline_path))
    flow_module = importlib.import_module(os.path.join(pipeline_path, "flow").replace("/", "."))
    flow: Flow = getattr(flow_module, "flow")
    This worked really well until we started seeing the error above. Previously, we used to have a separate
    deploy.py
    -script for each flow that would build and register the flows. We would ocasionally see a similar error, saying that it could not find a module called
    flow
    . Simply copying the
    flow.py
    file to the built image by adding it to the
    Docker
    -storage init solved this:
    flow.storage = Docker(
    # ....
      files={
        path.join(FLOW_DIR, "flow.py"): "/flow.py"
      }
    )
    The strange part is that this error was not deterministic, and would only happen for one or two of our many flows. Since we only had to add the lines above to the deployment-file, we considered the problem as "solved for now". While working on the CLI, we once again encountered the same problem. We eventually solved it the same way as before, but this time had to copy
    flow.py
    to
    /pipelines/flow_name/flow.py
    in order to make it work, but once again - only a few flows were affected, not all of them. When debugging this issue, we managed to reduce the flow to a state where the problem simply dissapeared. We isolated a function that caused the
    ModuleNotFoundError
    , however it was not clear what part of the function that caused the error. Finally reducing the function to a stub, the error still persisted. The function only consisted of a single line
    pass
    , but were still causing the error. Removing the call to the offending function magically solved the problem, even though it was just a stub. Our conclusion is that the
    flow.py
    -file somehow is pickled when the Docker-image is built, but only in certain situations. We do not know why. Post-ramble summary - We observed that the file containing our flow-definition would be pickled with the flow itself when building the docker-image, but only in certain situations. Does anyone know what might be causing this behavior? We are happy with having to copy
    flow.py
    to the final docker-image, but would like to understand what is going on here.
    Kevin Kho

    Kevin Kho

    8 months ago
    The Docker storage will pickle the Flow file by default. If you are already putting the file inside, you can use script based storage like this
    flow.storage = Docker(
        path="/location/in/image/my_flow.py",
        stored_as_script=True
    )
    You can also just supply the image directly to Docker storage if that is easier for you.
    Thomas Fredriksen

    Thomas Fredriksen

    8 months ago
    wow. Wish I noticed that earlier. Thanks anyway 🤡
    Kevin Kho

    Kevin Kho

    8 months ago
    I’m not 100% sure it will fix your problem though
    Thomas Fredriksen

    Thomas Fredriksen

    8 months ago
    I tried setting
    path
    and
    stored_as_script
    like you suggested, but unfortunately got the same error
    Any clue why a single stubbed function can cause this behavior? Could it be a cloudpickle-thing?
    Kevin Mullins

    Kevin Mullins

    8 months ago
    Not sure if it is the only/best approach; however, I achieved something similar for modules by creating a custom base docker image for flows that includes installing a python package containing the module.
    Kevin Kho

    Kevin Kho

    8 months ago
    It might be. Kevin’s suggestion is good. You can add the
    setup.py
    to your directory and install it as a Python module in the container to make sure everything is importable.
    Kevin Mullins

    Kevin Mullins

    8 months ago
    For your storage setup, it might looking something close to this:
    container_registry = os.environ.get("PREFECT_CONTAINER_REGISTRY", "localhost:5000")
        container_repository = os.environ.get(
            "PREFECT_CONTAINER_REPOSITORY", "fb-prefect-sandbox"
        )
        container_tag = os.environ.get("PREFECT_CONTAINER_TAG", "latest")
        base_image = f"{container_registry}/{container_repository}:{container_tag}"
    
        flow.storage = Docker(registry_url=container_registry, base_image=base_image)
    Kevin Kho

    Kevin Kho

    8 months ago
    Maybe this will also help
    Thomas Fredriksen

    Thomas Fredriksen

    8 months ago
    Perfect, thanks! I will give this a try