Hi folks. In my flow file, I have some local lib i...
# prefect-community
j
Hi folks. In my flow file, I have some local lib imports like this:
Copy code
from lib.package1 import fn1
from lib.package2 import fn2
These correspond to files alongside my flow like:
lib/package1.py
and
lib/package2.py
. When calling
flow.register()
it looks like my Docker Storage healthcheck fails because it cannot find the lib module...
Copy code
Traceback (most recent call last):
  File "/opt/prefect/healthcheck.py", line 135, in <module>
    flows = cloudpickle_deserialization_check(flow_file_path)
  File "/opt/prefect/healthcheck.py", line 40, in cloudpickle_deserialization_check
    flows.append(cloudpickle.load(f))
ModuleNotFoundError: No module named 'lib'
Any tips on how to make that work?
c
Hi jars - in short, you’ll need to add these files / packages into an importable location within your Docker image (one rule of thumb is that you should be able to run your flow definition file from within the container). A fast but hacky way of doing this is to add the files to your Docker image and then add their location to your
$PATH
a
This is how I tackled the same issue: First, you need to copy these files over to the Docker image using the
files
argument. Pass something like like
files_to_be_copied = {i: f"/root/lib/extra_python_files/{i.name}" for i in extra_python_files}
where
extra_python_files
includes the local paths to the files you need copied over. Then, use
env_vars
argument in Docker to pass something like
{"PYTHONPATH": "/root/lib/extra_python_files"}
Inside the flow, you'll maybe have to adjust the
from ... import ...
line. During debugging, you may use relative imports, but when the flow runs they'll fail.
A problem arises when the flow depends on some non-python files, because you can't just simply add them to
$PATH
... The only solution, it seems, is to hardcode paths to these files inside the flow definition and replicate the folder structure exactly in
Docker(..., files=...)
. If anyone has better ideas, I'd love to hear them
c
The best practice here would be to convert your external files into a python package that can be installed. Then you could create a
Dockerfile
with:
Copy code
COPY ./my-package-contents /place-in-docker
RUN pip install /place-in-docker
j
Thanks @Chris White and @Arsenii for the ideas. I have got past the issue by populating my PYTHONPATH with the location of my lib directory. I think the local pip packages sound cleaner and will probably opt for that in the near future. However, a similar error has come up with my health check, but this time unable to find my flow's file. It is exactly the same error except:
No module named 'flow'
It doesn't seem like I can fix it the same way. I noticed something peculiar though, while looking at these Docs while investigating: https://docs.prefect.io/core/advanced_tutorials/local-debugging.html#locally-check-your-flow-s-docker-storage When running
print(built_storage.flows)
, it returns as
{}
My code is, at it's barest, something like this. Is this the right idea?
Copy code
# my real flow lives in here...
from flow import flow
from prefect.environments.storage.docker import Docker

flow.storage = Docker(
    registry_url=registry_url,
    dockerfile='Dockerfile',
    image_name='my_image',
    image_tag='0.0.0',
    env_vars={}
)

built_storage = flow.storage.build(push=False)

# returns {} ?
print(built_storage.flows)
c
Your docker image assumes it can import your flow from a module named
flow
that isn’t present within the image
j
What about pulling the files from github? Would that work?
c
Hmm what do you mean?
j
@Chris White, you nailed it, I got it. I confused myself by bashing into the flow's container, opening a python repl and proving to myself (erroneously) that
from flow import flow
worked... well, it obviously worked because the flow.py was inside my local directory. I understand... the health check needed access to it as well. I added it's location to PYTHONPATH and all is well.
c
ah gotcha - awesome!
@Marvin archive “ModuleNotFoundError when registering my Flow with Docker storage”
j
I've been told what I said before was not very clear. What I'm wondering is, if instead of bundling the helper files with the flow itself, would it make sense to have them stored in github and pull them from a separate repo as part of the flow.
m
unrelated, any plans to open-source Marvin 🙂
c
Hey @Jacob Blanco - yea that could work as well; the biggest hurdle is typically getting those files into an importable location. We actually have some changes in the works that might make integrating flows with Github much much easier, including making sure all the files are transparently importable 😉
haha @miko no current plans but it’s not off the table 😄
m
Marvin is really cool actually