Thomas Fredriksen
01/05/2022, 7:55 PMModuleNotFoundError: No module named 'pipelines'
The error seem quite clear, however there is no reference to any module called pipelines
, but the directory that holds my flows has this name. Overall, my project structure looks like this:
my_sdk
|- __init__.py
|- my_module.py
|- cli
|- __init__.py
|- __main__.py
pipelines
|- __init__.py
|- first_flow
|- __init__.py
|- flow.py
|- second_flow
|- __init__.py
|- flow.py
We have created a CLI-tool that handles deployment to our infrastructure, and works by dynamically importing the flow using importlib
, then setting up the Docker
storage before calling flow.register
.
# Import flow
sys.path.append(os.path.abspath(pipeline_path))
flow_module = importlib.import_module(os.path.join(pipeline_path, "flow").replace("/", "."))
flow: Flow = getattr(flow_module, "flow")
This worked really well until we started seeing the error above.
Previously, we used to have a separate deploy.py
-script for each flow that would build and register the flows. We would ocasionally see a similar error, saying that it could not find a module called flow
. Simply copying the flow.py
file to the built image by adding it to the Docker
-storage init solved this:
flow.storage = Docker(
# ....
files={
path.join(FLOW_DIR, "flow.py"): "/flow.py"
}
)
The strange part is that this error was not deterministic, and would only happen for one or two of our many flows. Since we only had to add the lines above to the deployment-file, we considered the problem as "solved for now".
While working on the CLI, we once again encountered the same problem. We eventually solved it the same way as before, but this time had to copy flow.py
to /pipelines/flow_name/flow.py
in order to make it work, but once again - only a few flows were affected, not all of them.
When debugging this issue, we managed to reduce the flow to a state where the problem simply dissapeared. We isolated a function that caused the ModuleNotFoundError
, however it was not clear what part of the function that caused the error. Finally reducing the function to a stub, the error still persisted. The function only consisted of a single line pass
, but were still causing the error. Removing the call to the offending function magically solved the problem, even though it was just a stub.
Our conclusion is that the flow.py
-file somehow is pickled when the Docker-image is built, but only in certain situations. We do not know why.
Post-ramble summary - We observed that the file containing our flow-definition would be pickled with the flow itself when building the docker-image, but only in certain situations. Does anyone know what might be causing this behavior? We are happy with having to copy flow.py
to the final docker-image, but would like to understand what is going on here.Kevin Kho
flow.storage = Docker(
path="/location/in/image/my_flow.py",
stored_as_script=True
)
Kevin Kho
Thomas Fredriksen
01/05/2022, 8:04 PMKevin Kho
Thomas Fredriksen
01/05/2022, 8:30 PMpath
and stored_as_script
like you suggested, but unfortunately got the same errorThomas Fredriksen
01/05/2022, 8:32 PMKevin Mullins
01/05/2022, 8:34 PMKevin Kho
setup.py
to your directory and install it as a Python module in the container to make sure everything is importable.Kevin Mullins
01/05/2022, 8:36 PMcontainer_registry = os.environ.get("PREFECT_CONTAINER_REGISTRY", "localhost:5000")
container_repository = os.environ.get(
"PREFECT_CONTAINER_REPOSITORY", "fb-prefect-sandbox"
)
container_tag = os.environ.get("PREFECT_CONTAINER_TAG", "latest")
base_image = f"{container_registry}/{container_repository}:{container_tag}"
flow.storage = Docker(registry_url=container_registry, base_image=base_image)
Kevin Kho
Thomas Fredriksen
01/06/2022, 7:26 AM