hi gang what s the best practice way of registering flows th Prefect Community #ask-community

hi gang! what's the best practice way of registeri...

Constantino Schillebeeckx

08/03/2021, 5:22 PM

hi gang! what's the best practice way of registering flows that aren't contained to just one

.py

file? we've split out commonly shared functionality between flows into an e.g.

utils.py

file which is referenced in the flow. given all the storage documentation this design doesn't seem to fit into the intended use of storage.

Kevin Kho

08/03/2021, 5:27 PM

Hey @Constantino Schillebeeckx, Docker storage is the recommended approach where you would couple all of your dependencies into in the image so that they become available at runtime. This also might work with Module storage where if you package everything up as a Python module and install it on the agent, you could run those flows. I have an example of module storage here . The downside though is that it’s a bit easy to become out of sync between the module on the agent and development (unless the agent is Local). If

utils.py

does not change often, you can have it as a Python module inside your Docker container and then store your flow somewhere else like S3. S3 + DockerRun will pull the flow from S3 and run it on top of the specified container. This way, you won’t have to keep rebuilding containers.

Constantino Schillebeeckx

08/03/2021, 5:30 PM

So if I understand correctly, I might do the following: Create a custom docker container with all my extra requirements as well as my custom code (e.g.

utils.py

, reference this container in my ECSRun, and then use S3 for just the flow's

workflow.py

which is able to e.g.

import utils

Zanie

08/03/2021, 5:36 PM

https://github.com/PrefectHQ/prefect/discussions/4042 may be useful as well 🙂

Constantino Schillebeeckx

08/03/2021, 5:37 PM

yessss my next question was around CD 🙂 thanks for the resource, I'll post back here if I've got any other questions

Kevin Kho

08/03/2021, 5:43 PM

Yes that’s what I meant!

Constantino Schillebeeckx

08/03/2021, 5:45 PM

side question: as I'm developing on ECR and ECS, and I'm pushing changes, testing things out, it feels like some "things" are getting cached. e.g. I'm getting a failure like

Copy code

ModuleNotFoundError("No module named 'pipelines.custom_docker'")

when I've since removed all references to that line of code, pushed new containers and re-registered the flow

Kevin Kho

08/03/2021, 5:50 PM

Is your image tagged to latest and did you register with the latest image? What storage are you using? DockerStorage?

Constantino Schillebeeckx

08/03/2021, 5:51 PM

yes and GitHub

Kevin Kho

08/03/2021, 5:51 PM

As in ECSRun and Github storage?

Constantino Schillebeeckx

08/03/2021, 5:51 PM

ooo I have to push the change 🙂

Kevin Kho

08/03/2021, 5:51 PM

Yeahhh. Probably that 👍

Constantino Schillebeeckx

08/03/2021, 5:51 PM

As in ECSRun and Github storage?

correct

Constantino Schillebeeckx

08/05/2021, 6:45 PM

@Kevin Kho A follow up on this. I've taken your advice and I've installed my module in the docker container, and then I'm referencing that container in my ECSRun run_config; I'm using Github as my storage. When I go to run the flow I'm seeing:

Copy code

Failed to load and execute Flow's environment: ModuleNotFoundError("No module named 'flows.custom_docker'")

When I pull that docker image down, it seems like that module does exist:

Kevin Kho

08/05/2021, 6:46 PM

How did you install in the module?

pip

conda

Constantino Schillebeeckx

08/05/2021, 6:46 PM

pip

Kevin Kho

08/05/2021, 6:47 PM

Do you have an environment that might not be used? Or you just installed everything with pip?

Constantino Schillebeeckx

08/05/2021, 6:47 PM

everything is installed with pip; no virtual environment is used in the Docker container

Kevin Kho

08/05/2021, 6:47 PM

Could you show me the Dockerfile?

Constantino Schillebeeckx

08/05/2021, 6:48 PM

Kevin Kho

08/05/2021, 6:49 PM

Can you show me the ECS RunConfig?

Constantino Schillebeeckx

08/05/2021, 6:52 PM

Constantino Schillebeeckx

08/05/2021, 6:55 PM

Is there any caching going on with regards to downloading that ECR image?

Kevin Kho

08/05/2021, 6:59 PM

Everything looks alright. I think what might be happening is they might be installed in different locations inside the container like this . Maybe you can try

pip install .

instead of the

python setup.py install

to make sure? Is

./flows

everything in your package? I don’t believe there should be caching if you are running as a task with Fargate/ECS. If you did on EC2 then there might be, but you could tag to be explicitly sure? How did you test the container to see if you could import? You downloaded your image and then used

exec

Constantino Schillebeeckx

08/05/2021, 7:02 PM

Let me try

pip install .

. Yep,

./flows

is all my code; that's what setup.py references

Constantino Schillebeeckx

08/05/2021, 7:02 PM

My screenshot above shows how I tested things, I did a

docker run -it --rm 864 /bin/bash

Kevin Kho

08/05/2021, 7:04 PM

You might also be able to test this locally with a Docker Run and Docker agent if that makes testing easier

Constantino Schillebeeckx

08/05/2021, 7:05 PM

I've been able to reproduce it locally - looks like you're right about my install being wonky.

Constantino Schillebeeckx

08/05/2021, 7:05 PM

I'll report back on a fix

Constantino Schillebeeckx

08/05/2021, 7:18 PM

so it looks like indeed my method of installing my module wasn't working; I still can't get it to work with

pip install .

- I'm guessing there's something wrong with my

setup.py

For the time being I've worked around it with the following in my Dockerfile

Kevin Kho

08/05/2021, 7:19 PM

Can I see your setup.py? (I’m just asking for everything at this point lol) I have a suspicion. I ran into a similar thing two days ago

Constantino Schillebeeckx

08/05/2021, 7:20 PM

Kevin Kho

08/05/2021, 7:21 PM

It looks right to me. Maybe try

packages=find_packages()

instead of

['flows']

, but this would require

___init__.py_

files i think in those subdiractories

Kevin Kho

08/05/2021, 7:23 PM

Actually you can test this on local by doing

python setup.py bdist_wheel

, and then extract the wheel and examine the contents to see if everything is in there.

Constantino Schillebeeckx

08/05/2021, 7:24 PM

FYI,

__init__.py

only in the highest level dir

Constantino Schillebeeckx

08/05/2021, 7:26 PM

🤷

Constantino Schillebeeckx

08/05/2021, 7:27 PM

interesting, when I do build inside the container, it doesn't add the subdirs

Kevin Kho

08/05/2021, 7:29 PM

I see, maybe the init in the sub dirs will help

find_packages()

get them? Are you python versions the same? At least we know the culprit

Constantino Schillebeeckx

08/05/2021, 8:03 PM

you rock! it was a combination of using

find_packages()

and then having an

__init__.py

in every subdir

Kevin Kho

08/05/2021, 8:04 PM

Nice! Yeah I had to do it but don’t know why 🤷

3 Views

Open in Slack

Previous Next