What if I want to import `pandas` or `numpy` or an...
# ask-community
o
What if I want to import
pandas
or
numpy
or any other 3rd party dependency from my workflow, what is the best practice to do that? The agent running my deployment/workflow might not have these packages in place, right? Should I build a custom agent Dockerfile with all of the dependencies, or is there a better approach to it? what are the tradeoffs between the different solutions?
1
m
Hey @Ofir generally speaking this is largely dependent on what your flows look like holistically as well as how you're defining your storage what infrastructure you're relying on. This discourse article is a good starting point for working through how you might want to setup your environments. In some cases it's better to have a specific image built with the necessary dependencies for each flow/deployment this is generally more optimized but depending on the number of flows you have may be overkill. If the flows are all running in the same environment as your agent then the agent would need to have all of the necessary dependencies installed prior to the flow running, i.e. a custom image with all of the dependencies depending on where this is hosted and how many dependencies your managing this could get pretty large and if your flows don't all require the same packages to run you may open yourself up to dependency conflicts this way Personally I like to keep the agent lightweight run flows in a separate execution environment with custom images for the flows themselves it's more setup early on but avoids dependency headaches later but again that ultimately depends on your use cases and what you're trying to solve.
o
Thanks @Mason Menges appreciate that! Does the agent have to run as a Docker container or can it be a local process / daemon? Assuming it’s a custom Docker container, I guess that I need to inherit from the base image of the agent image (can probably be inferred from
prefect server config
) and then add the pip installs there, build the custom Docker image, and point the docker-compose.yml to this image, right?
But what if the dependency is a wheel that I haven’t published to any pip repository whatsoever? Isn’t it more flexible to supply your own customized Docker image?
(for the agent)
m
The agent can definitely be a local process, and you can run flows directly in it or you can setup flows to spin up a docker container so they're executed in an isolated environment. The article I linked to above runs through several different examples of potential deployment setups. Where does your agent live currently? As for the docker side of it I'm not entirely sure I follow you're second question you should be able to build the image ahead of time and setup a docker block pointing to the image Ultimately the short version is that wherever your code is running whether thats the same location as the agent or a separate container/environment needs to have all of the requirements present for the flow to run, The Extra Pip Packages you reference is one way to accomplish that for flow runs as it'll install any necessary packages prior to the flow running on the docker container this would be utilizing the docker infrastructure block on a deployment though which is separate from the agent. If you have a custom library somewhere that hasn't been published you're likely need to copy that into whatever image the flow is running on
o
Thanks @Mason Menges!