https://prefect.io logo
Title
r

Rio McMahon

02/23/2022, 11:23 PM
If I want to have an external python file (e.g. in a
src/
directory) what is the best way to import it? I tried following similar logic to this: https://docs.prefect.io/orchestration/flow_config/storage.html#loading-additional-files-with-git-storage but adding to the import path:
import pathlib, sys
file_path = pathlib.Path(__file__).resolve().parent
sys.path.append(file_path)
But keep getting this error:
[23 February 2022 4:22pm]: Failed to load and execute Flow's environment: ModuleNotFoundError("No module named 'src'")
Is there a best practice for importing external python code into a flow?
k

Kevin Kho

02/23/2022, 11:23 PM
Are you using the Local agent?
You can specify the working dir like the last example here
r

Rio McMahon

02/23/2022, 11:26 PM
No I am using Git (preferably GitLab) for storage and ECSRun as my config. My flow looks like:
# general prefect imports
import prefect
from prefect import task, Flow
from prefect.storage import Git
from prefect.run_configs import ECSRun
from prefect.client import Secret

# specific imports to load files from src/
import pathlib, sys
file_path = pathlib.Path(__file__).resolve().parent
sys.path.append(file_path)

from src.seasonality_index_builder_dynamic_agg import run_seasonality_index_builder_dynamic_agg

# define a wrapper task to expose logging
@task(log_stdout=True, checkpoint=False)
def run_script():
    logger = prefect.context.get("logger")
    <http://logger.info|logger.info>("Running script...")
    run_seasonality_index_builder_dynamic_agg()

# instantiate the flow - we store the flow definition in gitlab
with Flow("seasonality_index_builder",
        storage=Git(
            [git info]
            ),
        run_config=ECSRun(
            [ECS stuff]
            )
         ) as flow:
    run_script()

# Register the flow under the "tutorial" project
flow.register(project_name="Testing",
        labels=['ds']
        )
k

Kevin Kho

02/23/2022, 11:28 PM
Ah yeah in this case it really needs to go into the container for ECSRun. Git storage is not intended to handle other Python files, just stuff like sql and yaml. The Path manipulation is pretty hard and might be impossible. Of course, if you find a solution please share so we can archive.
r

Rio McMahon

02/23/2022, 11:47 PM
Okay - is something like this pretty typical then:
COPY src /home/mambauser/src
in the dockerfile, then
import pathlib, sys, os
sys.path.append(pathlib.Path(os.environ["HOME"]).resolve())
in the flow. In this case
os.environ["HOME"]
should resolve to
/home/mambauser
k

Kevin Kho

02/23/2022, 11:48 PM
Not really because at this point you may as well install that
src
as a Python package so it’s accessible wherever the Flow runs. Are you familiar with how to do that?
r

Rio McMahon

02/23/2022, 11:50 PM
As in break out the contents of
src
into a module then install via pip within my docker container?
k

Kevin Kho

02/23/2022, 11:51 PM
Yes, but I don’t know what you mean by “break out”. I think you just need to provide a
setup.py
?
If ever it helps you, here is a blog for that
r

Rio McMahon

02/23/2022, 11:54 PM
Poor phrasing on my part - that blog post will be a good starting point. Thanks a ton for the quick response and feedback. Really enjoying prefect so far!
👍 1
k

Kevin Kho

02/23/2022, 11:58 PM
Of course! 🙂