Ofir
03/01/2023, 7:14 PMpandas
or numpy
or any other 3rd party dependency from my workflow, what is the best practice to do that?
The agent running my deployment/workflow might not have these packages in place, right?
Should I build a custom agent Dockerfile with all of the dependencies, or is there a better approach to it? what are the tradeoffs between the different solutions? (edited)Ryan Peden
03/01/2023, 7:31 PMEXTRA_PIP_PACKAGES
environment variable. EXTRA_PIP_PACKAGES
is convenient for testing, but I recommend building dependencies into your image when running in production because installing dependencies at runtime adds significant overhead to each flow run.
Nearly everything mentioned about DockerContainer
also applies to other infrastructure blocks that uses Docker containers, such as KubernetesJob
, ECSTask
for AWS, AzureContainerInstanceJob
, and CloudRunJob
for Google Cloud. One difference is that it's fine to use Kubernetes or cloud infrastructure from a containerized agent.
Feel free to ask for more detail on anything I covered here; I mentioned a lot of concepts, and it's okay if you aren't familiar with all of them. šOfir
03/01/2023, 7:34 PMPrefect agents rely on Docker images for executing flow runs usingorDockerContainer
infrastructure. If you do not specify an image, we will use a Prefect image tag that matches your local Prefect and Python versions. If you are building your own image, you may find it useful to use one of the Prefect images as a base.KubernetesJob
Ryan Peden
03/01/2023, 7:52 PMKubernetesJob
is built into the prefect
package, your agents shouldn't need any extra dependencies.
2. Then, create a custom image that's only used for running flows. Your agents will use the KubernetesJob
block to create a separate K8s job for each flow, and you can tell specify the name of the image to use.
a. You can install all your dependencies and (optionally) copy your flow code in the image. By default, Prefect looks in /opt/prefect/flows
, but you can specify your own path if needed.
b. If you don't want to copy your code into the image, you'll need to set up Storage for your deployments. Then, when the K8s job for your flow starts up, Prefect will download your code into the flow container and run it.Ofir
03/01/2023, 7:55 PMRyan Peden
03/01/2023, 7:55 PM