Hey folks, i’ve been running into this `ModuleNotF...
# ask-community
c
Hey folks, i’ve been running into this
ModuleNotFoundError
for some time now and cannot seem to get this working despite this probably being a basic issue. I had a conversation with marvin about it here: https://prefect-community.slack.com/archives/C04DZJC94DC/p1697506986741959 . In a nutshell, my code is being stored via github storage block and being executed in a google cloud run job- so the code is being pulled into the image i already created which has the external dependencies baked into them. However, my custom internal dependancies is where i’m having issues. The entry point is defined as
src/flows/flow_1/flow.py
. Moreover,
src/flows
is a directory where i store many flows in this manner:
src/flows/flow_2/flow.py
, etc. However, these
flow.py
’s are referencing code in other subdirectories. For example, in:
src/flows/tools/tool1.py
src/nlp/ner/ner.py
, etc…what would be the best approach here? I’ve attempted relative imports but I ran into this error:
Copy code
ImportError: attempted relative import beyond top-level package
I’ve also attempted imports as such:
from src.flows.tools
but receive
ModuleNotFoundError
on this as well for the
src
module. Do i need to add a
PYTHONPATH
environment variable in the Google Cloud Run Pool config that stores every single subdirectory i use or is there a better approach? (i also have
__init__.py
’s in every folder) thanks!!!
n
hmm, as a potential work around, can you just install the deps from the where they live? e.g.
Copy code
pip install git+<https://github.com/PrefectHQ/some-repo.git@some-branch[extra1,extra2]>
im not sure i totally understand your setup with custom modules, does the code live somewhere public by chance?
c
Hey @Nate, the code is in a private repository. In terms of the deployment, the fine grained access token is being used to pull it in from the repo.
n
okay, well if your modules can be installed as a package then you could still do
Copy code
pip install git+https://{$MY_TOKEN}@github.com/PrefectHQ/some-repo.git@some-branch[extra1,extra2]
in a pull step i think but anyways, would it be helpful to try things on docker locally to make sure the image you're creating has stuff where you intend?
c
project structure looks like this:
Copy code
Pipelines/
|
|-- src/
|   |-- flows/
|   |   |--- flow_1/
|   |   |------- flow_1.py
|   |   |--- flow_2/
|   |   |------- flow_2.py
|   |   |...
|   |-- helpers/
|   |---- helper.py ..
|.  |-- nlp/
|.  |------ named_entity_recognition.py
|-- requirements
|-- README
entrypoint defined in
prefect.yaml
:
src/flows/flow_1/flow_1.py
ahh okay i see- yeah, it would probably be helpful to try it in docker locally. In terms of “has stuff where you intend?” Is this referring to the code itself and its dependencies?
n
> has stuff where you intend i just mean that your code is being copied into the image in the way that your code expects / like your dir structure above hmm what are you doing to create a package? (that you could expect to install / import) if you're not, I would suggest it might be helpful to break out your helpers and make it installable
c
nothing at the moment- when I was testing locally, i had a
setup.py
file i would run:
Copy code
from setuptools import setup, find_namespace_packages

setup(
    name='bettercart-flows',
    version='1.0.0',
    packages=find_namespace_packages(include=['nlp', 'nlp.*', 'flows', 'flows.*', 'src', 'src.*']),
    package_data = {'nlp': ['patterns/const/*', 'assets/const/*', 'assets/models/*']}
)
no import errors would occur with this. But i wasn’t sure how to use this when using an image that pulls in the repo then gets ran on google cloud run.
n
sorry, i just realized you're using infra blocks / storage blocks (as opposed to workers and prefect.yaml) so you dont have a pull step, if you did, I'd think you could just include that setup.py in what you clone down and then do
pip install .
to use that
setup.py
and install your stuff into your runtime one suggestion, that might work, is to set
EXTRA_PIP_PACKAGES="."
in your
env
of your cloud run job infra block, but that feels sort of hacky (which means we'll run
pip install .
on your behalf) i think the best solution here to is to write a dockerfile that installs your helpers at
docker build
time, so that your storage block only needs to clone your flow code itself, whose deps are already baked into the image and do not need to be cloned
c
ahh i see- okay. I think I understand what you mean. Just wondering, doesn’t the whole repository get cloned into the image anyways when using a github storage block? Based on the
prefect.yaml
file:
Copy code
...
  pull:
  - prefect.deployments.steps.git_clone:
      repository: <https://github.com/><org>/<repo>.git
      branch: main
      access_token: '{{ prefect.blocks.secret.github_token}}'
...
n
yes, to be pedantic, the github storage block clones the repo at runtime onto the container created from the
image
on your cloud run job. what I suggested would probably make more sense if you had your helpers in a different repo that was independently package-able from your flow code, which may not be what you want to do. if you just want to clone everything for now, I'd just suggest running a container locally from the same image you're trying on cloud run and just opening python and trying to figure out whats wrong with the namespacing - that's what I'd do at least 🤷 🙂
c
gotchya! I’ll go ahead and do that- thanks @Nate! One last question here if you don’t mind! Just to be sure…I’m assuming the manner in which i trigger the flow when testing this will matter. The entrypoint i defined in the deployment is:
src/flows/flow_1/flow1.py
. From prefect’s perspective, does the command to use when running the flow then translate to:
python src/flows/flow_1/flow1.py
? Or is it
python-m src.flows.flow_1.flow1.py
, or a different way?
n
with infra block stuff, i would say do not change the
command
field on your infra block unless you know exactly why you want to do that. its actually running the
engine
module, which takes care of running your flow, it doesnt quite just run the flow the whole entrypoint / path ambiguity with infra blocks is part of the reason for workers existing now, but in general i think there are a couple useful guidelines (which may not always be the best advice, but often are): • specify
entrypoint
for every deployment, leave
path
unset (there may be cases where you have to set path, but hopefully not 🙂 ) • define
entrypoint
to be relative the root of your storage block, so
src/flows/flow_1/flow1.py
👍
c
okay- awesome. Thanks @Nate!
👍 1