Willian Chan

    Willian Chan

    1 year ago
    Hello everyone, I'm stuck in an error when running my flow with helper scripts from  
    GitLab
      The UI tells me: 
    Failed to load and execute Flow's environment: ModuleNotFoundError("No module named 'mail_client'")
     . The main problem here is that the file mail_client.py is not present in the agent, and for me it is impracticable to send each auxiliary script to the agent (there is going to be a lot of flows) The structure of the repository:
    gitlab-repository/
    ├── flow.py
    └── mail_client.py
    Inside my flow.py it imports the mail_client:
    from mail_client import MailClient
    ...
    ...
    The configuration for 
    GitLab
     storage:
    flow.storage = GitLab(
        repo="XXXXX",
        host="XXXXX",
        path="flow.py",
        secrets=["GITLAB_ACCESS_TOKEN"]
    )
    I need the agent to be able to pull the entire repository because there will be many processes being inserted in the prefect and there is no way to change the agent with each modification in a process. does anyone have any solution for this? Thanks
    Kevin Kho

    Kevin Kho

    1 year ago
    Hi @Willian Chan, just a friendly reminder to post large details in threads so they don’t crowd the Slack. The Gitlab Storage only sees the file of your flow. For this type of setup: the recommended approach is to package them with
    Docker
    . This documentation may help: https://docs.prefect.io/orchestration/recipes/configuring_storage.html#including-other-python-scripts
    There is an issue for this here: https://github.com/PrefectHQ/prefect/issues/4328
    Willian Chan

    Willian Chan

    1 year ago
    Thanks for the response Kevin! the problem I have with this solution is that every time the flow changes it will be necessary to build a new docker image
    do you have any tips for changing the agent code to pull the entire repository? I can contribute to the project if I manage to make the modification
    Kevin Kho

    Kevin Kho

    1 year ago
    The new docker image would be best practice, but if you want to circumvent that, there was a comment from earlier today where the person packaged their code into a library and installed that library in the agent that the flow would run on.
    Regarding the progress of this ticket, I would have to get back to you when I get more information. You could install your code as a library on the agent in editable mode
    pip install -e .
    and then pull from your repo every time you make a change. Not best practice but this might work.
    Willian Chan

    Willian Chan

    1 year ago
    thanks for the help Kevin, the problem with this is that the same agent will run many different flows, and a manual installation of libraries on the agent makes the process of putting into production slow and laborious
    Kevin Kho

    Kevin Kho

    1 year ago
    I think you can write a Makefile to do all the necessary pulls, but this sounds like a scenario where Docker would be important right? To make sure each Flow has an isolated environment?
    Or a shell script*
    Willian Chan

    Willian Chan

    1 year ago
    essentially all processes are very similar, all perform ETL, the differences are the source of the data and the generation of reports, so all will basically use the same libraries
    Kevin Kho

    Kevin Kho

    1 year ago
    Yeah maybe automating with a shell script on the agent to install dependencies is the easiest for you
    Willian Chan

    Willian Chan

    1 year ago
    makes sense, but I would have to write to that shell script the location of the repository every time a new flow is registered, correct?
    Kevin Kho

    Kevin Kho

    1 year ago
    Not sure what you mean. I am imagining the content of the shell script to be
    git pull work_repo, pip install xxx
    . And then your flow script would use it like
    from work_repo import func1
    . Prefect just knows a file exists and will try to run it. This normally does not work because the agent doesn’t have dependencies, but if you install the dependencies, you can get it to work
    so when you make edits to the repo, just pull it on your agent to get the latest version and when the flow runs, it will use that latest version. Does that help?
    Willian Chan

    Willian Chan

    1 year ago
    yes, it is something like what you said, i am only imagining that the agent will run different flows, in that case i will have to pull from every repository
    Kevin Kho

    Kevin Kho

    1 year ago
    Ah yes. Add all of their repos to your 1 shell script lol