Hey folks i ve been running into this `ModuleNotFoundError` Prefect Community #ask-community

Hey folks, i’ve been running into this `ModuleNotF...

Choenden Kyirong

10/17/2023, 5:41 PM

Hey folks, i’ve been running into this

ModuleNotFoundError

for some time now and cannot seem to get this working despite this probably being a basic issue. I had a conversation with marvin about it here: https://prefect-community.slack.com/archives/C04DZJC94DC/p1697506986741959 . In a nutshell, my code is being stored via github storage block and being executed in a google cloud run job- so the code is being pulled into the image i already created which has the external dependencies baked into them. However, my custom internal dependancies is where i’m having issues. The entry point is defined as

src/flows/flow_1/flow.py

. Moreover,

src/flows

is a directory where i store many flows in this manner:

src/flows/flow_2/flow.py

, etc. However, these

flow.py

’s are referencing code in other subdirectories. For example, in:

src/flows/tools/tool1.py

src/nlp/ner/ner.py

, etc…what would be the best approach here? I’ve attempted relative imports but I ran into this error:

Copy code

ImportError: attempted relative import beyond top-level package

I’ve also attempted imports as such:

from src.flows.tools

but receive

ModuleNotFoundError

on this as well for the

src

module. Do i need to add a

PYTHONPATH

environment variable in the Google Cloud Run Pool config that stores every single subdirectory i use or is there a better approach? (i also have

__init__.py

’s in every folder) thanks!!!

Nate

10/17/2023, 6:12 PM

hmm, as a potential work around, can you just install the deps from the where they live? e.g.

Copy code

pip install git+<https://github.com/PrefectHQ/some-repo.git@some-branch[extra1,extra2]>

im not sure i totally understand your setup with custom modules, does the code live somewhere public by chance?

Choenden Kyirong

10/17/2023, 6:15 PM

Hey @Nate, the code is in a private repository. In terms of the deployment, the fine grained access token is being used to pull it in from the repo.

Nate

10/17/2023, 6:17 PM

okay, well if your modules can be installed as a package then you could still do

Copy code

pip install git+https://{$MY_TOKEN}@github.com/PrefectHQ/some-repo.git@some-branch[extra1,extra2]

in a pull step i think but anyways, would it be helpful to try things on docker locally to make sure the image you're creating has stuff where you intend?

Choenden Kyirong

10/17/2023, 6:18 PM

project structure looks like this:

Copy code

Pipelines/
|
|-- src/
|   |-- flows/
|   |   |--- flow_1/
|   |   |------- flow_1.py
|   |   |--- flow_2/
|   |   |------- flow_2.py
|   |   |...
|   |-- helpers/
|   |---- helper.py ..
|.  |-- nlp/
|.  |------ named_entity_recognition.py
|-- requirements
|-- README

entrypoint defined in

prefect.yaml

src/flows/flow_1/flow_1.py

Choenden Kyirong

10/17/2023, 6:20 PM

ahh okay i see- yeah, it would probably be helpful to try it in docker locally. In terms of “has stuff where you intend?” Is this referring to the code itself and its dependencies?

Nate

10/17/2023, 6:22 PM

> has stuff where you intend i just mean that your code is being copied into the image in the way that your code expects / like your dir structure above hmm what are you doing to create a package? (that you could expect to install / import) if you're not, I would suggest it might be helpful to break out your helpers and make it installable

Choenden Kyirong

10/17/2023, 6:24 PM

nothing at the moment- when I was testing locally, i had a

setup.py

file i would run:

Copy code

from setuptools import setup, find_namespace_packages

setup(
    name='bettercart-flows',
    version='1.0.0',
    packages=find_namespace_packages(include=['nlp', 'nlp.*', 'flows', 'flows.*', 'src', 'src.*']),
    package_data = {'nlp': ['patterns/const/*', 'assets/const/*', 'assets/models/*']}
)

no import errors would occur with this. But i wasn’t sure how to use this when using an image that pulls in the repo then gets ran on google cloud run.

Nate

10/17/2023, 6:29 PM

sorry, i just realized you're using infra blocks / storage blocks (as opposed to workers and prefect.yaml) so you dont have a pull step, if you did, I'd think you could just include that setup.py in what you clone down and then do

pip install .

to use that

setup.py

and install your stuff into your runtime one suggestion, that might work, is to set

EXTRA_PIP_PACKAGES="."

in your

env

of your cloud run job infra block, but that feels sort of hacky (which means we'll run

pip install .

on your behalf) i think the best solution here to is to write a dockerfile that installs your helpers at

docker build

time, so that your storage block only needs to clone your flow code itself, whose deps are already baked into the image and do not need to be cloned

Choenden Kyirong

10/17/2023, 6:36 PM

ahh i see- okay. I think I understand what you mean. Just wondering, doesn’t the whole repository get cloned into the image anyways when using a github storage block? Based on the

prefect.yaml

file:

Copy code

...
  pull:
  - prefect.deployments.steps.git_clone:
      repository: <https://github.com/><org>/<repo>.git
      branch: main
      access_token: '{{ prefect.blocks.secret.github_token}}'
...

Nate

10/17/2023, 6:41 PM

yes, to be pedantic, the github storage block clones the repo at runtime onto the container created from the

image

on your cloud run job. what I suggested would probably make more sense if you had your helpers in a different repo that was independently package-able from your flow code, which may not be what you want to do. if you just want to clone everything for now, I'd just suggest running a container locally from the same image you're trying on cloud run and just opening python and trying to figure out whats wrong with the namespacing - that's what I'd do at least 🤷 🙂

Choenden Kyirong

10/17/2023, 6:44 PM

gotchya! I’ll go ahead and do that- thanks @Nate! One last question here if you don’t mind! Just to be sure…I’m assuming the manner in which i trigger the flow when testing this will matter. The entrypoint i defined in the deployment is:

src/flows/flow_1/flow1.py

. From prefect’s perspective, does the command to use when running the flow then translate to:

python src/flows/flow_1/flow1.py

? Or is it

python-m src.flows.flow_1.flow1.py

, or a different way?

Nate

10/17/2023, 6:49 PM

with infra block stuff, i would say do not change the

command

field on your infra block unless you know exactly why you want to do that. its actually running the

engine

module, which takes care of running your flow, it doesnt quite just run the flow the whole entrypoint / path ambiguity with infra blocks is part of the reason for workers existing now, but in general i think there are a couple useful guidelines (which may not always be the best advice, but often are): • specify

entrypoint

for every deployment, leave

path

unset (there may be cases where you have to set path, but hopefully not 🙂 ) • define

entrypoint

to be relative the root of your storage block, so

src/flows/flow_1/flow1.py

👍

Choenden Kyirong

10/17/2023, 6:51 PM

okay- awesome. Thanks @Nate!

👍 1

2 Views

Open in Slack

Previous Next