I am looking to setup Docker storage with multiple...
# prefect-community
n
I am looking to setup Docker storage with multiple flows. I found this example in docs: https://docs.prefect.io/orchestration/recipes/multi_flow_storage.html but this example assumes all flows are in the same file, and there is no way to specify the file_path in
storage.add_flow()
method. Is there a way around it?
k
I think you can create your image ahead of time and then just use Local storage to specify the path like this
n
in that case, how is the image built? I have a Dockerfile which I believe is supposed to be used as the intermediate image for the flow container image.
Copy code
FROM python:3.9
WORKDIR /application
COPY requirements.txt .

ARG GITHUB_ACCESS_TOKEN
RUN git config --global url."https://${GITHUB_ACCESS_TOKEN}:@github.com/".insteadOf "<https://github.com/>"

RUN pip install --upgrade pip \
    && pip install -r requirements.txt

COPY . .
If I simply do:
docker build myimage -t latest
and push it to my ECR repo, will that work?
k
Exactly yeah and then you just point to the image in DockerRun and the flow in the Local storage
Use
flow.register(build=False)
👀 1
n
@Tomohiro Nakagawa
@Kevin Kho I tried the recipe you described above, I am getting this error:
Copy code
Flow is not contained in this Storage
Traceback (most recent call last):
  File "/usr/local/bin/prefect", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/prefect/cli/execute.py", line 92, in flow_run
    raise exc
  File "/usr/local/lib/python3.9/site-packages/prefect/cli/execute.py", line 73, in flow_run
    flow = storage.get_flow(flow_data.name)
  File "/usr/local/lib/python3.9/site-packages/prefect/storage/local.py", line 88, in get_flow
    raise ValueError("Flow is not contained in this Storage")
ValueError: Flow is not contained in this Storage
Here’s my code: `say_hello.py`:
Copy code
@task
def say_hello_task(name):
    logger = prefect.context.get('logger')
    <http://logger.info|logger.info>(f'Hello {name}')

with Flow('say_hello') as say_hello_flow:
    people = Parameter('people', default=['Aay', 'Bee'])
    say_hello_task.map(people)

say_hello_flow.storage = Local(
    path=f'{/application/flows/{__file__}',
    stored_as_script=True,
    add_default_labels=False
)
register_flows.py
Copy code
from prefect.run_configs import ECSRun

# local modules
import wordcount
import say_hello
Copy code
ecs_run = ECSRun(
    run_task_kwargs={'cluster': 'dev-prefect-cluster'},
    task_role_arn='arn:aws:iam::<>:role/dev-task-runner-ecs-task-role',
    execution_role_arn='arn:aws:iam::<>:role/dev-task-runner-ecs-task-execution-role',
    image='<>.<http://dkr.ecr.us-west-1.amazonaws.com/dev-automation-scripts-ecr:latest|dkr.ecr.us-west-1.amazonaws.com/dev-automation-scripts-ecr:latest>',
    labels=['ecs', 'dev']
)

flows = [
    wordcount.count_words_flow,
    say_hello.say_hello_flow
]

for flow in flows:
    flow.run_config = ecs_run
    flow.register(project_name='Dev-Test', build=False)
`Dockerfile`:
Copy code
FROM python:3.9
WORKDIR /application
COPY requirements.txt .

ARG GITHUB_ACCESS_TOKEN
RUN git config --global url."https://${GITHUB_ACCESS_TOKEN}:@github.com/".insteadOf "<https://github.com/>"

RUN pip install --upgrade pip \
    && pip install -r requirements.txt

COPY . .
I think the issue is that the flows are never added to the Local storage in the container. But I can’t think of a way to do that.
k
Does COPY . . add the flows in there?
Could you try
build=True
? Or just calling
flow.register
without the build argument?
n
yes, it must be copied, because the module import error is coming from the file that contains the flow.
wait.. sorry.. I am talking about something else. I got around this in a different way. I reverted back to using DockerStorage and manually set the path on the DockerStorage object every time I call add_flow:
Copy code
flows = [
    wordcount.count_words_flow,
    say_hello.say_hello_flow
]

for flow in flows:
    flow_path = f'{common.FLOW_APPLICATION_PATH}/{flow.name}.py'
    docker_storage.path = flow_path
    docker_storage.add_flow(flow)

docker_storage.build()

for flow in flows:
    flow.storage = docker_storage
    flow.run_config = ecs_run
    # flow.executor = LocalExecutor()
    flow.register(project_name='Dev-Test', build=False)
k
Ah ok are you good now?
n
yes, thanks!
@Kevin Kho I tried using Local storage with build=True and it worked. However, I am running into a different issue now. I am not able to import local modules when the flow runs in ECS. This is happening in both Local storage and Docker storage. I ran the docker image locally and was able to import the modules and run the flow successfully.
k
That sounds like your custom code is not packaged into a pip installable module. If you can pip install it, you would make it available from wherever Python is run inside the container. Alternatively, you can add it in the PYTHONPATH. Have you created a Python package before?
💡 1
n
@Kevin Kho still getting the Module import error. Here’s my setup.py:
Copy code
from setuptools import setup, find_packages
from os import path

import __about__

loc = path.abspath(path.dirname(__file__))

with open(loc + '/requirements.txt') as f:
    requirements = f.read().splitlines()

required = []
dependency_links = []

# Do not add to required lines pointing to Git repositories
EGG_MARK = '#egg='
for line in requirements:
    if line.startswith('-e git:') or line.startswith('-e git+') or \
            line.startswith('git:') or line.startswith('git+'):
        line = line.lstrip('-e ')  # in case that is using "-e"
        if EGG_MARK in line:
            package_name = line[line.find(EGG_MARK) + len(EGG_MARK):]
            repository = line[:line.find(EGG_MARK)]
            required.append('%s @ %s' % (package_name, repository))
            dependency_links.append(line)
        else:
            print('Dependency to a git repository should have the format:')
            print('<git+ssh://git@github.com/xxxxx/xxxxxx#egg=package_name>')
    else:
        required.append(line)

setup(
    name='automation-scripts',  # Required
    version=__about__.__version__,
    description='Description here....',  # Required
    packages=find_packages(),  # Required
    install_requires=required,
    dependency_links=dependency_links,
)
and here’s the dir structure:
Copy code
automation-scripts
|-- __init__.py
|-- flows/
    |-- __init__.py
    |-- say_hello.py
    |-- common.py
In
say_hello.py
, I have the line
import common
which is giving the module import error.
k
Do you pip install this inside the Docker container?
n
Yes:
Copy code
FROM python:3.9
WORKDIR /opt/prefect
COPY . .

ARG GITHUB_ACCESS_TOKEN
RUN git config --global url."https://${GITHUB_ACCESS_TOKEN}:@github.com/".insteadOf "<https://github.com/>" \
    && pip install .
Not able to reproduce the error in local Docker run. I was able to run the flow from any directory in the container, even before I created the python package using
setup.py
k
Hard to say. Do you potentially have two Python versions in the image? What does
which python
give you?
Also if you build the wheel on local and open it, do you see everything you expect, just to confirm the modules you expect go in there?
n
which python
gives:
/usr/local/bin/python
which points to python3.9 Checking wheel info…
sorry dumb question, where would I find wheel info? I don’t see it in the project folder or in site-packages in the virtualenv
k
You can do
python setup.py bdist_wheel
I think which builds a wheel file
n
Thanks!! I did that and it generated a build folder with following contents:
Copy code
build/
|-- bdist.macosx-10.9-x86_64/
|-- lib/
    |-- flows/
         |-- __init__.py
         |-- common.py
         |-- say_hello.py
So as far as I can tell, wheel looks okay.
k
Yeah that does look good. I think you can do
pip show package_name
inside the container to show it was installed right
n
For those who run into the same issue in future: I realized that my project structure was not the standard one that you’d use with
setuptools
. Usually when you want to build a python package for a project, all the source code is nested inside a
src
or
myproject
folder which wasn’t the case for me since I added
setup.py
as an after-thought. It ended up creating packages for individual directories in my project (e.g.
flows
became a package instead of
mypackage.flows
) So I had to import like so:
from flows import common
instead of
from myproject.flows import common
or the usual
import common
that you can do when you don’t have packages at all.
k
Ohhh man how did you find that out?
n
I noticed it when I was reading setuptools documentation: https://setuptools.pypa.io/en/latest/userguide/quickstart.html When you want to build a python project into a package, this is how your dir structure should look like:
Copy code
~/mypackage/
    pyproject.toml
    setup.cfg       # or setup.py
    mypackage/      # or src/
         __init__.py
         some_code.py
         somefolder/*
k
Ahh
Wait, but how did the imports work when running Docker directly?
n
inside Docker, I was just doing
import common
since
common.py
was in the same folder as
say_hello.py
. And I guess it worked fine because pythonpath was working correctly when I ran:
prefect run -p /opt/prefect/flows/say_hello.py
I also figured out how to reproduce in local Docker run what would happen when running on ECS using this code.
Copy code
fp='/opt/prefect/flows/say_hello.py'
with open(fp) as f:
  contents = f.read()

exec(contents)  # this throws module not found error
This way I can test my imports in local Docker run without having to push images and having to run everything on ECS.
k
Ahh I see. Thanks for sharing! That’s a lot of figuring stuff out