Thread
#prefect-community
    Nikhil Jain

    Nikhil Jain

    4 months ago
    I am looking to setup Docker storage with multiple flows. I found this example in docs:https://docs.prefect.io/orchestration/recipes/multi_flow_storage.html but this example assumes all flows are in the same file, and there is no way to specify the file_path in
    storage.add_flow()
    method. Is there a way around it?
    Kevin Kho

    Kevin Kho

    4 months ago
    I think you can create your image ahead of time and then just use Local storage to specify the path like this
    Nikhil Jain

    Nikhil Jain

    4 months ago
    in that case, how is the image built? I have a Dockerfile which I believe is supposed to be used as the intermediate image for the flow container image.
    FROM python:3.9
    WORKDIR /application
    COPY requirements.txt .
    
    ARG GITHUB_ACCESS_TOKEN
    RUN git config --global url."https://${GITHUB_ACCESS_TOKEN}:@github.com/".insteadOf "<https://github.com/>"
    
    RUN pip install --upgrade pip \
        && pip install -r requirements.txt
    
    COPY . .
    If I simply do:
    docker build myimage -t latest
    and push it to my ECR repo, will that work?
    Kevin Kho

    Kevin Kho

    4 months ago
    Exactly yeah and then you just point to the image in DockerRun and the flow in the Local storage
    Use
    flow.register(build=False)
    Nikhil Jain

    Nikhil Jain

    4 months ago
    @Tomohiro Nakagawa
    @Kevin Kho I tried the recipe you described above, I am getting this error:
    Flow is not contained in this Storage
    Traceback (most recent call last):
      File "/usr/local/bin/prefect", line 8, in <module>
        sys.exit(cli())
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
        return __callback(*args, **kwargs)
      File "/usr/local/lib/python3.9/site-packages/prefect/cli/execute.py", line 92, in flow_run
        raise exc
      File "/usr/local/lib/python3.9/site-packages/prefect/cli/execute.py", line 73, in flow_run
        flow = storage.get_flow(flow_data.name)
      File "/usr/local/lib/python3.9/site-packages/prefect/storage/local.py", line 88, in get_flow
        raise ValueError("Flow is not contained in this Storage")
    ValueError: Flow is not contained in this Storage
    Here’s my code: say_hello.py:
    @task
    def say_hello_task(name):
        logger = prefect.context.get('logger')
        <http://logger.info|logger.info>(f'Hello {name}')
    
    with Flow('say_hello') as say_hello_flow:
        people = Parameter('people', default=['Aay', 'Bee'])
        say_hello_task.map(people)
    
    say_hello_flow.storage = Local(
        path=f'{/application/flows/{__file__}',
        stored_as_script=True,
        add_default_labels=False
    )
    register_flows.py
    from prefect.run_configs import ECSRun
    
    # local modules
    import wordcount
    import say_hello
    ecs_run = ECSRun(
        run_task_kwargs={'cluster': 'dev-prefect-cluster'},
        task_role_arn='arn:aws:iam::<>:role/dev-task-runner-ecs-task-role',
        execution_role_arn='arn:aws:iam::<>:role/dev-task-runner-ecs-task-execution-role',
        image='<>.<http://dkr.ecr.us-west-1.amazonaws.com/dev-automation-scripts-ecr:latest|dkr.ecr.us-west-1.amazonaws.com/dev-automation-scripts-ecr:latest>',
        labels=['ecs', 'dev']
    )
    
    flows = [
        wordcount.count_words_flow,
        say_hello.say_hello_flow
    ]
    
    for flow in flows:
        flow.run_config = ecs_run
        flow.register(project_name='Dev-Test', build=False)
    Dockerfile:
    FROM python:3.9
    WORKDIR /application
    COPY requirements.txt .
    
    ARG GITHUB_ACCESS_TOKEN
    RUN git config --global url."https://${GITHUB_ACCESS_TOKEN}:@github.com/".insteadOf "<https://github.com/>"
    
    RUN pip install --upgrade pip \
        && pip install -r requirements.txt
    
    COPY . .
    I think the issue is that the flows are never added to the Local storage in the container. But I can’t think of a way to do that.
    Kevin Kho

    Kevin Kho

    4 months ago
    Does COPY . . add the flows in there?
    Could you try
    build=True
    ? Or just calling
    flow.register
    without the build argument?
    Nikhil Jain

    Nikhil Jain

    4 months ago
    yes, it must be copied, because the module import error is coming from the file that contains the flow.
    wait.. sorry.. I am talking about something else. I got around this in a different way. I reverted back to using DockerStorage and manually set the path on the DockerStorage object every time I call add_flow:
    flows = [
        wordcount.count_words_flow,
        say_hello.say_hello_flow
    ]
    
    for flow in flows:
        flow_path = f'{common.FLOW_APPLICATION_PATH}/{flow.name}.py'
        docker_storage.path = flow_path
        docker_storage.add_flow(flow)
    
    docker_storage.build()
    
    for flow in flows:
        flow.storage = docker_storage
        flow.run_config = ecs_run
        # flow.executor = LocalExecutor()
        flow.register(project_name='Dev-Test', build=False)
    Kevin Kho

    Kevin Kho

    4 months ago
    Ah ok are you good now?
    Nikhil Jain

    Nikhil Jain

    4 months ago
    yes, thanks!
    @Kevin Kho I tried using Local storage with build=True and it worked. However, I am running into a different issue now. I am not able to import local modules when the flow runs in ECS. This is happening in both Local storage and Docker storage. I ran the docker image locally and was able to import the modules and run the flow successfully.
    Kevin Kho

    Kevin Kho

    4 months ago
    That sounds like your custom code is not packaged into a pip installable module. If you can pip install it, you would make it available from wherever Python is run inside the container. Alternatively, you can add it in the PYTHONPATH. Have you created a Python package before?
    Nikhil Jain

    Nikhil Jain

    4 months ago
    @Kevin Kho still getting the Module import error. Here’s my setup.py:
    from setuptools import setup, find_packages
    from os import path
    
    import __about__
    
    loc = path.abspath(path.dirname(__file__))
    
    with open(loc + '/requirements.txt') as f:
        requirements = f.read().splitlines()
    
    required = []
    dependency_links = []
    
    # Do not add to required lines pointing to Git repositories
    EGG_MARK = '#egg='
    for line in requirements:
        if line.startswith('-e git:') or line.startswith('-e git+') or \
                line.startswith('git:') or line.startswith('git+'):
            line = line.lstrip('-e ')  # in case that is using "-e"
            if EGG_MARK in line:
                package_name = line[line.find(EGG_MARK) + len(EGG_MARK):]
                repository = line[:line.find(EGG_MARK)]
                required.append('%s @ %s' % (package_name, repository))
                dependency_links.append(line)
            else:
                print('Dependency to a git repository should have the format:')
                print('<git+ssh://git@github.com/xxxxx/xxxxxx#egg=package_name>')
        else:
            required.append(line)
    
    setup(
        name='automation-scripts',  # Required
        version=__about__.__version__,
        description='Description here....',  # Required
        packages=find_packages(),  # Required
        install_requires=required,
        dependency_links=dependency_links,
    )
    and here’s the dir structure:
    automation-scripts
    |-- __init__.py
    |-- flows/
        |-- __init__.py
        |-- say_hello.py
        |-- common.py
    In
    say_hello.py
    , I have the line
    import common
    which is giving the module import error.
    Kevin Kho

    Kevin Kho

    4 months ago
    Do you pip install this inside the Docker container?
    Nikhil Jain

    Nikhil Jain

    4 months ago
    Yes:
    FROM python:3.9
    WORKDIR /opt/prefect
    COPY . .
    
    ARG GITHUB_ACCESS_TOKEN
    RUN git config --global url."https://${GITHUB_ACCESS_TOKEN}:@github.com/".insteadOf "<https://github.com/>" \
        && pip install .
    Not able to reproduce the error in local Docker run. I was able to run the flow from any directory in the container, even before I created the python package using
    setup.py
    Kevin Kho

    Kevin Kho

    4 months ago
    Hard to say. Do you potentially have two Python versions in the image? What does
    which python
    give you?
    Also if you build the wheel on local and open it, do you see everything you expect, just to confirm the modules you expect go in there?
    Nikhil Jain

    Nikhil Jain

    4 months ago
    which python
    gives:
    /usr/local/bin/python
    which points to python3.9 Checking wheel info…
    sorry dumb question, where would I find wheel info? I don’t see it in the project folder or in site-packages in the virtualenv
    Kevin Kho

    Kevin Kho

    4 months ago
    You can do
    python setup.py bdist_wheel
    I think which builds a wheel file
    Nikhil Jain

    Nikhil Jain

    4 months ago
    Thanks!! I did that and it generated a build folder with following contents:
    build/
    |-- bdist.macosx-10.9-x86_64/
    |-- lib/
        |-- flows/
             |-- __init__.py
             |-- common.py
             |-- say_hello.py
    So as far as I can tell, wheel looks okay.
    Kevin Kho

    Kevin Kho

    4 months ago
    Yeah that does look good. I think you can do
    pip show package_name
    inside the container to show it was installed right
    Nikhil Jain

    Nikhil Jain

    4 months ago
    For those who run into the same issue in future: I realized that my project structure was not the standard one that you’d use with
    setuptools
    . Usually when you want to build a python package for a project, all the source code is nested inside a
    src
    or
    myproject
    folder which wasn’t the case for me since I added
    setup.py
    as an after-thought. It ended up creating packages for individual directories in my project (e.g.
    flows
    became a package instead of
    mypackage.flows
    ) So I had to import like so:
    from flows import common
    instead of
    from myproject.flows import common
    or the usual
    import common
    that you can do when you don’t have packages at all.
    Kevin Kho

    Kevin Kho

    4 months ago
    Ohhh man how did you find that out?
    Nikhil Jain

    Nikhil Jain

    4 months ago
    I noticed it when I was reading setuptools documentation:https://setuptools.pypa.io/en/latest/userguide/quickstart.html When you want to build a python project into a package, this is how your dir structure should look like:
    ~/mypackage/
        pyproject.toml
        setup.cfg       # or setup.py
        mypackage/      # or src/
             __init__.py
             some_code.py
             somefolder/*
    Kevin Kho

    Kevin Kho

    4 months ago
    Ahh
    Wait, but how did the imports work when running Docker directly?
    Nikhil Jain

    Nikhil Jain

    4 months ago
    inside Docker, I was just doing
    import common
    since
    common.py
    was in the same folder as
    say_hello.py
    . And I guess it worked fine because pythonpath was working correctly when I ran:
    prefect run -p /opt/prefect/flows/say_hello.py
    I also figured out how to reproduce in local Docker run what would happen when running on ECS using this code.
    fp='/opt/prefect/flows/say_hello.py'
    with open(fp) as f:
      contents = f.read()
    
    exec(contents)  # this throws module not found error
    This way I can test my imports in local Docker run without having to push images and having to run everything on ECS.
    Kevin Kho

    Kevin Kho

    4 months ago
    Ahh I see. Thanks for sharing! That’s a lot of figuring stuff out