Lana Dann

    Lana Dann

    9 months ago
    hi, how can i set
    PYTHONPATH
    for storage? for context, i’m using gitlab storage for my flow and flows are in
    lib/flows
    but when the flow tries to run, it errors with
    ModuleNotFoundError: No module named 'lib'
    when the flow tries to access the storage.
    Kevin Kho

    Kevin Kho

    9 months ago
    What agent are you using?
    Lana Dann

    Lana Dann

    9 months ago
    i’m using an ECSAgent that’s running on a fargate ecs service
    Kevin Kho

    Kevin Kho

    9 months ago
    I think you can do it when building the container like this
    Lana Dann

    Lana Dann

    9 months ago
    hmm to clarify, the ECS agent is running in a service in a separate container. then i have another container that contains all the source code for the flows that spins up and registers all of those flows as part of a CI/CD pipeline. do your solutions assume that the container running the agent also contains the source code for the flows?
    Kevin Kho

    Kevin Kho

    9 months ago
    I think with ECS Agent you are always spinning up a new container for the flow run wherever that agent is running. So this is the image that would go in the
    ECSRun(image=…)
    and the agent starts a new ECS container with that image and then gets the Flow from storage and runs it on top
    Lana Dann

    Lana Dann

    9 months ago
    i’m running ecs runs with task definitions arns that have containers that already have their own source code (that is different from the prefect repo). so i don’t think i can change the pythonpath in those containers… i’m kind of confused at how flows are run using storage. i assume this is what happens: the agent kicks off a new container with the given task definition and copies the flow source code to the
    flow
    container and then executes the flow. is that correct?
    Kevin Kho

    Kevin Kho

    9 months ago
    That is pretty right. The container is kicked off, and pulls the flow from storage, and then uses
    prefect execute flow
    as a CLI command in the container to start the flow run.
    PYTHONPATH
    is not a storage thing (unless you use Docker storage). The script based storages like Github or S3 just hold the flow code and that flow code is pulled down when the flow is executed. The
    PYTHONPATH
    has to go on the execution environment (the container)
    Lana Dann

    Lana Dann

    9 months ago
    ohh then i understand now. is it just the files that we define for the storage for the flows that get copied over? what if i have something like a
    constants.py
    file that i import in my flow? does that also get copied over?
    Kevin Kho

    Kevin Kho

    9 months ago
    Yes exactly the dependencies are not copied over because
    cloudpickle
    did not support deep copying of Python modules (until very recently). With the
    cloudpickle
    development this experience might be improved, but that is definitely a Prefect 2.0 thing. So yes, only the flow file is copied over by default and the dependencies need to be added to the execution environment (through the container)