https://prefect.io logo
Title
n

Nikhil Jain

05/19/2022, 9:19 PM
I am looking to setup Docker storage with multiple flows. I found this example in docs: https://docs.prefect.io/orchestration/recipes/multi_flow_storage.html but this example assumes all flows are in the same file, and there is no way to specify the file_path in
storage.add_flow()
method. Is there a way around it?
k

Kevin Kho

05/19/2022, 9:20 PM
I think you can create your image ahead of time and then just use Local storage to specify the path like this
n

Nikhil Jain

05/19/2022, 9:25 PM
in that case, how is the image built? I have a Dockerfile which I believe is supposed to be used as the intermediate image for the flow container image.
FROM python:3.9
WORKDIR /application
COPY requirements.txt .

ARG GITHUB_ACCESS_TOKEN
RUN git config --global url."https://${GITHUB_ACCESS_TOKEN}:@github.com/".insteadOf "<https://github.com/>"

RUN pip install --upgrade pip \
    && pip install -r requirements.txt

COPY . .
If I simply do:
docker build myimage -t latest
and push it to my ECR repo, will that work?
k

Kevin Kho

05/19/2022, 9:28 PM
Exactly yeah and then you just point to the image in DockerRun and the flow in the Local storage
Use
flow.register(build=False)
👀 1
n

Nikhil Jain

05/19/2022, 9:53 PM
@Tomohiro Nakagawa
@Kevin Kho I tried the recipe you described above, I am getting this error:
Flow is not contained in this Storage
Traceback (most recent call last):
  File "/usr/local/bin/prefect", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/prefect/cli/execute.py", line 92, in flow_run
    raise exc
  File "/usr/local/lib/python3.9/site-packages/prefect/cli/execute.py", line 73, in flow_run
    flow = storage.get_flow(flow_data.name)
  File "/usr/local/lib/python3.9/site-packages/prefect/storage/local.py", line 88, in get_flow
    raise ValueError("Flow is not contained in this Storage")
ValueError: Flow is not contained in this Storage
Here’s my code: `say_hello.py`:
@task
def say_hello_task(name):
    logger = prefect.context.get('logger')
    <http://logger.info|logger.info>(f'Hello {name}')

with Flow('say_hello') as say_hello_flow:
    people = Parameter('people', default=['Aay', 'Bee'])
    say_hello_task.map(people)

say_hello_flow.storage = Local(
    path=f'{/application/flows/{__file__}',
    stored_as_script=True,
    add_default_labels=False
)
register_flows.py
from prefect.run_configs import ECSRun

# local modules
import wordcount
import say_hello
ecs_run = ECSRun(
    run_task_kwargs={'cluster': 'dev-prefect-cluster'},
    task_role_arn='arn:aws:iam::<>:role/dev-task-runner-ecs-task-role',
    execution_role_arn='arn:aws:iam::<>:role/dev-task-runner-ecs-task-execution-role',
    image='<>.<http://dkr.ecr.us-west-1.amazonaws.com/dev-automation-scripts-ecr:latest|dkr.ecr.us-west-1.amazonaws.com/dev-automation-scripts-ecr:latest>',
    labels=['ecs', 'dev']
)

flows = [
    wordcount.count_words_flow,
    say_hello.say_hello_flow
]

for flow in flows:
    flow.run_config = ecs_run
    flow.register(project_name='Dev-Test', build=False)
`Dockerfile`:
FROM python:3.9
WORKDIR /application
COPY requirements.txt .

ARG GITHUB_ACCESS_TOKEN
RUN git config --global url."https://${GITHUB_ACCESS_TOKEN}:@github.com/".insteadOf "<https://github.com/>"

RUN pip install --upgrade pip \
    && pip install -r requirements.txt

COPY . .
I think the issue is that the flows are never added to the Local storage in the container. But I can’t think of a way to do that.
k

Kevin Kho

05/20/2022, 1:17 AM
Does COPY . . add the flows in there?
Could you try
build=True
? Or just calling
flow.register
without the build argument?
n

Nikhil Jain

05/20/2022, 1:21 AM
yes, it must be copied, because the module import error is coming from the file that contains the flow.
wait.. sorry.. I am talking about something else. I got around this in a different way. I reverted back to using DockerStorage and manually set the path on the DockerStorage object every time I call add_flow:
flows = [
    wordcount.count_words_flow,
    say_hello.say_hello_flow
]

for flow in flows:
    flow_path = f'{common.FLOW_APPLICATION_PATH}/{flow.name}.py'
    docker_storage.path = flow_path
    docker_storage.add_flow(flow)

docker_storage.build()

for flow in flows:
    flow.storage = docker_storage
    flow.run_config = ecs_run
    # flow.executor = LocalExecutor()
    flow.register(project_name='Dev-Test', build=False)
k

Kevin Kho

05/20/2022, 1:25 AM
Ah ok are you good now?
n

Nikhil Jain

05/20/2022, 1:27 AM
yes, thanks!
@Kevin Kho I tried using Local storage with build=True and it worked. However, I am running into a different issue now. I am not able to import local modules when the flow runs in ECS. This is happening in both Local storage and Docker storage. I ran the docker image locally and was able to import the modules and run the flow successfully.
k

Kevin Kho

05/20/2022, 5:41 PM
That sounds like your custom code is not packaged into a pip installable module. If you can pip install it, you would make it available from wherever Python is run inside the container. Alternatively, you can add it in the PYTHONPATH. Have you created a Python package before?
💡 1
n

Nikhil Jain

05/20/2022, 7:27 PM
@Kevin Kho still getting the Module import error. Here’s my setup.py:
from setuptools import setup, find_packages
from os import path

import __about__

loc = path.abspath(path.dirname(__file__))

with open(loc + '/requirements.txt') as f:
    requirements = f.read().splitlines()

required = []
dependency_links = []

# Do not add to required lines pointing to Git repositories
EGG_MARK = '#egg='
for line in requirements:
    if line.startswith('-e git:') or line.startswith('-e git+') or \
            line.startswith('git:') or line.startswith('git+'):
        line = line.lstrip('-e ')  # in case that is using "-e"
        if EGG_MARK in line:
            package_name = line[line.find(EGG_MARK) + len(EGG_MARK):]
            repository = line[:line.find(EGG_MARK)]
            required.append('%s @ %s' % (package_name, repository))
            dependency_links.append(line)
        else:
            print('Dependency to a git repository should have the format:')
            print('<git+ssh://git@github.com/xxxxx/xxxxxx#egg=package_name>')
    else:
        required.append(line)

setup(
    name='automation-scripts',  # Required
    version=__about__.__version__,
    description='Description here....',  # Required
    packages=find_packages(),  # Required
    install_requires=required,
    dependency_links=dependency_links,
)
and here’s the dir structure:
automation-scripts
|-- __init__.py
|-- flows/
    |-- __init__.py
    |-- say_hello.py
    |-- common.py
In
say_hello.py
, I have the line
import common
which is giving the module import error.
k

Kevin Kho

05/20/2022, 7:29 PM
Do you pip install this inside the Docker container?
n

Nikhil Jain

05/20/2022, 7:29 PM
Yes:
FROM python:3.9
WORKDIR /opt/prefect
COPY . .

ARG GITHUB_ACCESS_TOKEN
RUN git config --global url."https://${GITHUB_ACCESS_TOKEN}:@github.com/".insteadOf "<https://github.com/>" \
    && pip install .
Not able to reproduce the error in local Docker run. I was able to run the flow from any directory in the container, even before I created the python package using
setup.py
k

Kevin Kho

05/20/2022, 7:42 PM
Hard to say. Do you potentially have two Python versions in the image? What does
which python
give you?
Also if you build the wheel on local and open it, do you see everything you expect, just to confirm the modules you expect go in there?
n

Nikhil Jain

05/20/2022, 7:47 PM
which python
gives:
/usr/local/bin/python
which points to python3.9 Checking wheel info…
sorry dumb question, where would I find wheel info? I don’t see it in the project folder or in site-packages in the virtualenv
k

Kevin Kho

05/20/2022, 7:55 PM
You can do
python setup.py bdist_wheel
I think which builds a wheel file
n

Nikhil Jain

05/20/2022, 8:00 PM
Thanks!! I did that and it generated a build folder with following contents:
build/
|-- bdist.macosx-10.9-x86_64/
|-- lib/
    |-- flows/
         |-- __init__.py
         |-- common.py
         |-- say_hello.py
So as far as I can tell, wheel looks okay.
k

Kevin Kho

05/20/2022, 8:02 PM
Yeah that does look good. I think you can do
pip show package_name
inside the container to show it was installed right
n

Nikhil Jain

05/20/2022, 10:09 PM
For those who run into the same issue in future: I realized that my project structure was not the standard one that you’d use with
setuptools
. Usually when you want to build a python package for a project, all the source code is nested inside a
src
or
myproject
folder which wasn’t the case for me since I added
setup.py
as an after-thought. It ended up creating packages for individual directories in my project (e.g.
flows
became a package instead of
mypackage.flows
) So I had to import like so:
from flows import common
instead of
from myproject.flows import common
or the usual
import common
that you can do when you don’t have packages at all.
k

Kevin Kho

05/20/2022, 10:10 PM
Ohhh man how did you find that out?
n

Nikhil Jain

05/20/2022, 10:14 PM
I noticed it when I was reading setuptools documentation: https://setuptools.pypa.io/en/latest/userguide/quickstart.html When you want to build a python project into a package, this is how your dir structure should look like:
~/mypackage/
    pyproject.toml
    setup.cfg       # or setup.py
    mypackage/      # or src/
         __init__.py
         some_code.py
         somefolder/*
k

Kevin Kho

05/20/2022, 10:15 PM
Ahh
Wait, but how did the imports work when running Docker directly?
n

Nikhil Jain

05/20/2022, 10:21 PM
inside Docker, I was just doing
import common
since
common.py
was in the same folder as
say_hello.py
. And I guess it worked fine because pythonpath was working correctly when I ran:
prefect run -p /opt/prefect/flows/say_hello.py
I also figured out how to reproduce in local Docker run what would happen when running on ECS using this code.
fp='/opt/prefect/flows/say_hello.py'
with open(fp) as f:
  contents = f.read()

exec(contents)  # this throws module not found error
This way I can test my imports in local Docker run without having to push images and having to run everything on ECS.
k

Kevin Kho

05/20/2022, 10:21 PM
Ahh I see. Thanks for sharing! That’s a lot of figuring stuff out