Choenden Kyirong
10/17/2023, 5:41 PMModuleNotFoundError
for some time now and cannot seem to get this working despite this probably being a basic issue. I had a conversation with marvin about it here: https://prefect-community.slack.com/archives/C04DZJC94DC/p1697506986741959 .
In a nutshell, my code is being stored via github storage block and being executed in a google cloud run job- so the code is being pulled into the image i already created which has the external dependencies baked into them. However, my custom internal dependancies is where i’m having issues. The entry point is defined as src/flows/flow_1/flow.py
. Moreover, src/flows
is a directory where i store many flows in this manner: src/flows/flow_2/flow.py
, etc. However, these flow.py
’s are referencing code in other subdirectories. For example, in: src/flows/tools/tool1.py
src/nlp/ner/ner.py
, etc…what would be the best approach here? I’ve attempted relative imports but I ran into this error:
ImportError: attempted relative import beyond top-level package
I’ve also attempted imports as such: from src.flows.tools
but receive ModuleNotFoundError
on this as well for the src
module. Do i need to add a PYTHONPATH
environment variable in the Google Cloud Run Pool config that stores every single subdirectory i use or is there a better approach?
(i also have __init__.py
’s in every folder)
thanks!!!Nate
10/17/2023, 6:12 PMpip install git+<https://github.com/PrefectHQ/some-repo.git@some-branch[extra1,extra2]>
im not sure i totally understand your setup with custom modules, does the code live somewhere public by chance?Choenden Kyirong
10/17/2023, 6:15 PMNate
10/17/2023, 6:17 PMpip install git+https://{$MY_TOKEN}@github.com/PrefectHQ/some-repo.git@some-branch[extra1,extra2]
in a pull step i think
but anyways, would it be helpful to try things on docker locally to make sure the image you're creating has stuff where you intend?Choenden Kyirong
10/17/2023, 6:18 PMPipelines/
|
|-- src/
| |-- flows/
| | |--- flow_1/
| | |------- flow_1.py
| | |--- flow_2/
| | |------- flow_2.py
| | |...
| |-- helpers/
| |---- helper.py ..
|. |-- nlp/
|. |------ named_entity_recognition.py
|-- requirements
|-- README
entrypoint defined in prefect.yaml
: src/flows/flow_1/flow_1.py
Choenden Kyirong
10/17/2023, 6:20 PMNate
10/17/2023, 6:22 PMChoenden Kyirong
10/17/2023, 6:24 PMsetup.py
file i would run:
from setuptools import setup, find_namespace_packages
setup(
name='bettercart-flows',
version='1.0.0',
packages=find_namespace_packages(include=['nlp', 'nlp.*', 'flows', 'flows.*', 'src', 'src.*']),
package_data = {'nlp': ['patterns/const/*', 'assets/const/*', 'assets/models/*']}
)
no import errors would occur with this. But i wasn’t sure how to use this when using an image that pulls in the repo then gets ran on google cloud run.Nate
10/17/2023, 6:29 PMpip install .
to use that setup.py
and install your stuff into your runtime
one suggestion, that might work, is to set EXTRA_PIP_PACKAGES="."
in your env
of your cloud run job infra block, but that feels sort of hacky (which means we'll run pip install .
on your behalf)
i think the best solution here to is to write a dockerfile that installs your helpers at docker build
time, so that your storage block only needs to clone your flow code itself, whose deps are already baked into the image and do not need to be clonedChoenden Kyirong
10/17/2023, 6:36 PMprefect.yaml
file:
...
pull:
- prefect.deployments.steps.git_clone:
repository: <https://github.com/><org>/<repo>.git
branch: main
access_token: '{{ prefect.blocks.secret.github_token}}'
...
Nate
10/17/2023, 6:41 PMimage
on your cloud run job.
what I suggested would probably make more sense if you had your helpers in a different repo that was independently package-able from your flow code, which may not be what you want to do.
if you just want to clone everything for now, I'd just suggest running a container locally from the same image you're trying on cloud run and just opening python and trying to figure out whats wrong with the namespacing - that's what I'd do at least 🤷 🙂Choenden Kyirong
10/17/2023, 6:44 PMsrc/flows/flow_1/flow1.py
. From prefect’s perspective, does the command to use when running the flow then translate to: python src/flows/flow_1/flow1.py
? Or is it python-m src.flows.flow_1.flow1.py
, or a different way?Nate
10/17/2023, 6:49 PMcommand
field on your infra block unless you know exactly why you want to do that. its actually running the engine
module, which takes care of running your flow, it doesnt quite just run the flow
the whole entrypoint / path ambiguity with infra blocks is part of the reason for workers existing now, but in general i think there are a couple useful guidelines (which may not always be the best advice, but often are):
• specify entrypoint
for every deployment, leave path
unset (there may be cases where you have to set path, but hopefully not 🙂 )
• define entrypoint
to be relative the root of your storage block, so src/flows/flow_1/flow1.py
👍Choenden Kyirong
10/17/2023, 6:51 PM