Thread
#prefect-community
    j

    Jeremy Knickerbocker

    2 years ago
    Hi Everyone, I am trying to come up with the best flow storage/deployment strategy for easy maintenance with a small team, but that is not going to be super painful to scale as we grow. We are running the Prefect Core server and have that configured properly, but I am struggling with the best approach to register flows and store them. I have pieced together a lot of different ideas from the documentation and slack, but I still think I am missing something. We have an ETL VM that we will be using as our host machine, it has Docker installed, and also hosts the Core UI. We use Azure for all of our infrastructure (VMs, SQL Server, Blob Storage, DevOps Repos, Container Registries) and are tied to that platform, but are open to using any other services available in Azure. We also have several internal python packages that we are actively developing (read, changing often) for interacting with our EHR and other systems. Looking at the available agents, the Docker Agent seems like it would be the best fit for us right now. However, I am struggling with the best approach to get our packages from our private repo and our flows into a docker image. Since we are using SQL Server, we have to customize our images to install pyodbc and the MSSQL ODBC drivers, the smallest image we have been able to develop has been ~600 MB. We are currently using a multi-stage build to clone our internal packages from Azure DevOps, then copying them to the image and setting the Python path. This works, but every time one of the packages changes we need to rebuild the image, push to the registry, and pull on the host machine; it feels extremely wasteful of cycles and storage. I was considering cloning the repositories to our host VM and using docker volumes to share the packages and flows, then we would not have to build the image every time we made a change. I was really excited to read the 0.12.5 release notes, I am hopeful the script storage functionality will benefit our use case, but I still need to explore it further. Does anyone have any suggestions on how to better architect this? I looked at https://docs.prefect.io/orchestration/execution/storage_options.html#non-docker-storage-for-containerized-environments but that seems to only be flow storage and I need to handle additional files that are updating very often. Thank you for any recommendations you may have!
    Jenny

    Jenny

    2 years ago
    Hi @Jeremy Knickerbocker - thanks for the question. I don't have any great suggestions for you but I'll check with the wider team to see if they've got any good ideas. Also hoping that others from the community may have ideas for you too.
    Jeremiah

    Jeremiah

    2 years ago
    Hi @Jeremy Knickerbocker- as a suggestion, for more open-ended questions like this you may prefer to open a GitHub discussion, which we recently added to the repo. The discussions are much more publicly visible/discoverable than Slack, which is better suited to succinct Q/A, especially because new questions will crowd yours as the chat scrolls (though today is unusually quiet!)