hi gang! what's the best practice way of registeri...
# ask-community
c
hi gang! what's the best practice way of registering flows that aren't contained to just one
.py
file? we've split out commonly shared functionality between flows into an e.g.
utils.py
file which is referenced in the flow. given all the storage documentation this design doesn't seem to fit into the intended use of storage.
k
Hey @Constantino Schillebeeckx, Docker storage is the recommended approach where you would couple all of your dependencies into in the image so that they become available at runtime. This also might work with Module storage where if you package everything up as a Python module and install it on the agent, you could run those flows. I have an example of module storage here . The downside though is that it’s a bit easy to become out of sync between the module on the agent and development (unless the agent is Local). If
utils.py
does not change often, you can have it as a Python module inside your Docker container and then store your flow somewhere else like S3. S3 + DockerRun will pull the flow from S3 and run it on top of the specified container. This way, you won’t have to keep rebuilding containers.
c
So if I understand correctly, I might do the following: Create a custom docker container with all my extra requirements as well as my custom code (e.g.
utils.py
, reference this container in my ECSRun, and then use S3 for just the flow's
workflow.py
which is able to e.g.
import utils
?
z
c
yessss my next question was around CD 🙂 thanks for the resource, I'll post back here if I've got any other questions
k
Yes that’s what I meant!
c
side question: as I'm developing on ECR and ECS, and I'm pushing changes, testing things out, it feels like some "things" are getting cached. e.g. I'm getting a failure like
Copy code
ModuleNotFoundError("No module named 'pipelines.custom_docker'")
when I've since removed all references to that line of code, pushed new containers and re-registered the flow
k
Is your image tagged to latest and did you register with the latest image? What storage are you using? DockerStorage?
c
yes and GitHub
k
As in ECSRun and Github storage?
c
ooo I have to push the change 🙂
k
Yeahhh. Probably that 👍
c
As in ECSRun and Github storage?
correct
@Kevin Kho A follow up on this. I've taken your advice and I've installed my module in the docker container, and then I'm referencing that container in my ECSRun run_config; I'm using Github as my storage. When I go to run the flow I'm seeing:
Copy code
Failed to load and execute Flow's environment: ModuleNotFoundError("No module named 'flows.custom_docker'")
When I pull that docker image down, it seems like that module does exist:
k
How did you install in the module?
pip
or
conda
?
c
pip
k
Do you have an environment that might not be used? Or you just installed everything with pip?
c
everything is installed with pip; no virtual environment is used in the Docker container
k
Could you show me the Dockerfile?
c
k
Can you show me the ECS RunConfig?
c
Is there any caching going on with regards to downloading that ECR image?
k
Everything looks alright. I think what might be happening is they might be installed in different locations inside the container like this . Maybe you can try
pip install .
instead of the
python setup.py install
to make sure? Is
./flows
everything in your package? I don’t believe there should be caching if you are running as a task with Fargate/ECS. If you did on EC2 then there might be, but you could tag to be explicitly sure? How did you test the container to see if you could import? You downloaded your image and then used
exec
?
c
Let me try
pip install .
. Yep,
./flows
is all my code; that's what setup.py references
My screenshot above shows how I tested things, I did a
docker run -it --rm 864 /bin/bash
k
You might also be able to test this locally with a Docker Run and Docker agent if that makes testing easier
c
I've been able to reproduce it locally - looks like you're right about my install being wonky.
I'll report back on a fix
so it looks like indeed my method of installing my module wasn't working; I still can't get it to work with
pip install .
- I'm guessing there's something wrong with my
setup.py
For the time being I've worked around it with the following in my Dockerfile
k
Can I see your setup.py? (I’m just asking for everything at this point lol) I have a suspicion. I ran into a similar thing two days ago
c
k
It looks right to me. Maybe try
packages=find_packages()
instead of
['flows']
, but this would require
___init__.py_
files i think in those subdiractories
Actually you can test this on local by doing
python setup.py bdist_wheel
, and then extract the wheel and examine the contents to see if everything is in there.
c
FYI,
__init__.py
only in the highest level dir
🤷
interesting, when I do build inside the container, it doesn't add the subdirs
k
I see, maybe the init in the sub dirs will help
find_packages()
get them? Are you python versions the same? At least we know the culprit
c
you rock! it was a combination of using
find_packages()
and then having an
__init__.py
in every subdir
k
Nice! Yeah I had to do it but don’t know why 🤷