https://prefect.io logo
Title
l

Lana Dann

12/02/2021, 9:20 PM
hi, how can i set
PYTHONPATH
for storage? for context, i’m using gitlab storage for my flow and flows are in
lib/flows
but when the flow tries to run, it errors with
ModuleNotFoundError: No module named 'lib'
when the flow tries to access the storage.
k

Kevin Kho

12/02/2021, 9:20 PM
What agent are you using?
l

Lana Dann

12/02/2021, 9:21 PM
i’m using an ECSAgent that’s running on a fargate ecs service
k

Kevin Kho

12/02/2021, 9:23 PM
I think you can do it when building the container like this
l

Lana Dann

12/02/2021, 9:33 PM
hmm to clarify, the ECS agent is running in a service in a separate container. then i have another container that contains all the source code for the flows that spins up and registers all of those flows as part of a CI/CD pipeline. do your solutions assume that the container running the agent also contains the source code for the flows?
k

Kevin Kho

12/02/2021, 9:36 PM
I think with ECS Agent you are always spinning up a new container for the flow run wherever that agent is running. So this is the image that would go in the
ECSRun(image=…)
and the agent starts a new ECS container with that image and then gets the Flow from storage and runs it on top
l

Lana Dann

12/02/2021, 9:43 PM
i’m running ecs runs with task definitions arns that have containers that already have their own source code (that is different from the prefect repo). so i don’t think i can change the pythonpath in those containers… i’m kind of confused at how flows are run using storage. i assume this is what happens: the agent kicks off a new container with the given task definition and copies the flow source code to the
flow
container and then executes the flow. is that correct?
k

Kevin Kho

12/02/2021, 9:48 PM
That is pretty right. The container is kicked off, and pulls the flow from storage, and then uses
prefect execute flow
as a CLI command in the container to start the flow run.
PYTHONPATH
is not a storage thing (unless you use Docker storage). The script based storages like Github or S3 just hold the flow code and that flow code is pulled down when the flow is executed. The
PYTHONPATH
has to go on the execution environment (the container)
l

Lana Dann

12/02/2021, 9:53 PM
ohh then i understand now. is it just the files that we define for the storage for the flows that get copied over? what if i have something like a
constants.py
file that i import in my flow? does that also get copied over?
k

Kevin Kho

12/02/2021, 9:55 PM
Yes exactly the dependencies are not copied over because
cloudpickle
did not support deep copying of Python modules (until very recently). With the
cloudpickle
development this experience might be improved, but that is definitely a Prefect 2.0 thing. So yes, only the flow file is copied over by default and the dependencies need to be added to the execution environment (through the container)