Thread
#prefect-community
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    hi gang! what's the best practice way of registering flows that aren't contained to just one
    .py
    file? we've split out commonly shared functionality between flows into an e.g.
    utils.py
    file which is referenced in the flow. given all the storage documentation this design doesn't seem to fit into the intended use of storage.
    Kevin Kho

    Kevin Kho

    1 year ago
    Hey @Constantino Schillebeeckx, Docker storage is the recommended approach where you would couple all of your dependencies into in the image so that they become available at runtime. This also might work with Module storage where if you package everything up as a Python module and install it on the agent, you could run those flows. I have an example of module storage here . The downside though is that it’s a bit easy to become out of sync between the module on the agent and development (unless the agent is Local). If
    utils.py
    does not change often, you can have it as a Python module inside your Docker container and then store your flow somewhere else like S3. S3 + DockerRun will pull the flow from S3 and run it on top of the specified container. This way, you won’t have to keep rebuilding containers.
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    So if I understand correctly, I might do the following: Create a custom docker container with all my extra requirements as well as my custom code (e.g.
    utils.py
    , reference this container in my ECSRun, and then use S3 for just the flow's
    workflow.py
    which is able to e.g.
    import utils
    ?
    Michael Adkins

    Michael Adkins

    1 year ago
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    yessss my next question was around CD 🙂 thanks for the resource, I'll post back here if I've got any other questions
    Kevin Kho

    Kevin Kho

    1 year ago
    Yes that’s what I meant!
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    side question: as I'm developing on ECR and ECS, and I'm pushing changes, testing things out, it feels like some "things" are getting cached. e.g. I'm getting a failure like
    ModuleNotFoundError("No module named 'pipelines.custom_docker'")
    when I've since removed all references to that line of code, pushed new containers and re-registered the flow
    Kevin Kho

    Kevin Kho

    1 year ago
    Is your image tagged to latest and did you register with the latest image? What storage are you using? DockerStorage?
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    yes and GitHub
    ooo I have to push the change 🙂
    Kevin Kho

    Kevin Kho

    1 year ago
    As in ECSRun and Github storage?
    Yeahhh. Probably that 👍
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    As in ECSRun and Github storage?
    correct
    @Kevin Kho A follow up on this. I've taken your advice and I've installed my module in the docker container, and then I'm referencing that container in my ECSRun run_config; I'm using Github as my storage. When I go to run the flow I'm seeing:
    Failed to load and execute Flow's environment: ModuleNotFoundError("No module named 'flows.custom_docker'")
    When I pull that docker image down, it seems like that module does exist:
    Kevin Kho

    Kevin Kho

    1 year ago
    How did you install in the module?
    pip
    or
    conda
    ?
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    pip
    Kevin Kho

    Kevin Kho

    1 year ago
    Do you have an environment that might not be used? Or you just installed everything with pip?
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    everything is installed with pip; no virtual environment is used in the Docker container
    Kevin Kho

    Kevin Kho

    1 year ago
    Could you show me the Dockerfile?
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    Kevin Kho

    Kevin Kho

    1 year ago
    Can you show me the ECS RunConfig?
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    Is there any caching going on with regards to downloading that ECR image?
    Kevin Kho

    Kevin Kho

    1 year ago
    Everything looks alright. I think what might be happening is they might be installed in different locations inside the container like this . Maybe you can try
    pip install .
    instead of the
    python setup.py install
    to make sure? Is
    ./flows
    everything in your package? I don’t believe there should be caching if you are running as a task with Fargate/ECS. If you did on EC2 then there might be, but you could tag to be explicitly sure? How did you test the container to see if you could import? You downloaded your image and then used
    exec
    ?
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    Let me try
    pip install .
    . Yep,
    ./flows
    is all my code; that's what setup.py references
    My screenshot above shows how I tested things, I did a
    docker run -it --rm 864 /bin/bash
    Kevin Kho

    Kevin Kho

    1 year ago
    You might also be able to test this locally with a Docker Run and Docker agent if that makes testing easier
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    I've been able to reproduce it locally - looks like you're right about my install being wonky.
    I'll report back on a fix
    so it looks like indeed my method of installing my module wasn't working; I still can't get it to work with
    pip install .
    - I'm guessing there's something wrong with my
    setup.py
    For the time being I've worked around it with the following in my Dockerfile
    Kevin Kho

    Kevin Kho

    1 year ago
    Can I see your setup.py? (I’m just asking for everything at this point lol) I have a suspicion. I ran into a similar thing two days ago
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    Kevin Kho

    Kevin Kho

    1 year ago
    It looks right to me. Maybe try
    packages=find_packages()
    instead of
    ['flows']
    , but this would require
    ___init__.py_
    files i think in those subdiractories
    Actually you can test this on local by doing
    python setup.py bdist_wheel
    , and then extract the wheel and examine the contents to see if everything is in there.
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    FYI,
    __init__.py
    only in the highest level dir
    🤷
    interesting, when I do build inside the container, it doesn't add the subdirs
    Kevin Kho

    Kevin Kho

    1 year ago
    I see, maybe the init in the sub dirs will help
    find_packages()
    get them? Are you python versions the same? At least we know the culprit
    Constantino Schillebeeckx

    Constantino Schillebeeckx

    1 year ago
    you rock! it was a combination of using
    find_packages()
    and then having an
    __init__.py
    in every subdir
    Kevin Kho

    Kevin Kho

    1 year ago
    Nice! Yeah I had to do it but don’t know why 🤷