https://prefect.io logo
Title
o

Ofir

03/01/2023, 7:14 PM
What if I want to import
pandas
or
numpy
or any other 3rd party dependency from my workflow, what is the best practice to do that? The agent running my deployment/workflow might not have these packages in place, right? Should I build a custom agent Dockerfile with all of the dependencies, or is there a better approach to it? what are the tradeoffs between the different solutions? (edited)
šŸ‘€ 1
āœ… 2
r

Ryan Peden

03/01/2023, 7:31 PM
If you want your agent to run your flows as subprocesses, adding the dependencies via a custom Dockerfile could be a good solution. That way, any dependencies you install for the agent should be available to your flows, too. You could also create a Dockerfile specifically for your flows, and then let your agent run each flow in a separate container via `DockerContainer` infrastructure. If you do this, you'd probably want to run your agent on a VM instead of in a container, since running containers inside containers can be problematic. If you use one of Prefect's pre-built images to run your flows, you could install the packages at runtime via the
EXTRA_PIP_PACKAGES
environment variable.
EXTRA_PIP_PACKAGES
is convenient for testing, but I recommend building dependencies into your image when running in production because installing dependencies at runtime adds significant overhead to each flow run. Nearly everything mentioned about
DockerContainer
also applies to other infrastructure blocks that uses Docker containers, such as
KubernetesJob
,
ECSTask
for AWS,
AzureContainerInstanceJob
, and
CloudRunJob
for Google Cloud. One difference is that it's fine to use Kubernetes or cloud infrastructure from a containerized agent. Feel free to ask for more detail on anything I covered here; I mentioned a lot of concepts, and it's okay if you aren't familiar with all of them. šŸ™‚
o

Ofir

03/01/2023, 7:34 PM
Prefect agents rely on Docker images for executing flow runs using
DockerContainer
or
KubernetesJob
infrastructure. If you do not specify an image, we will use a Prefect image tag that matches your local Prefect and Python versions. If you are building your own image, you may find it useful to use one of the Prefect images as a base.
This is perfect
šŸ‘ 1
Or prefect šŸ™‚
:prefect: 2
First of all thanks a lot @Ryan Peden, this is a very detailed and useful answer
Second, I guess Iā€™m still learning and trying to digest the Prefect concepts, but how does a KubernetesJob and a customized Docker agent image play ball together?
Or are they orthogonal concepts? that is, I can have a KubernetesJob with any kind of image.
r

Ryan Peden

03/01/2023, 7:52 PM
If you use Kubernetes, the setup that probably makes the most sense is doing the following: 1. Run an agent (or multiple agents) in your K8s cluster using one of Prefect's pre-built images. Since
KubernetesJob
is built into the
prefect
package, your agents shouldn't need any extra dependencies. 2. Then, create a custom image that's only used for running flows. Your agents will use the
KubernetesJob
block to create a separate K8s job for each flow, and you can tell specify the name of the image to use. a. You can install all your dependencies and (optionally) copy your flow code in the image. By default, Prefect looks in
/opt/prefect/flows
, but you can specify your own path if needed. b. If you don't want to copy your code into the image, you'll need to set up Storage for your deployments. Then, when the K8s job for your flow starts up, Prefect will download your code into the flow container and run it.
o

Ofir

03/01/2023, 7:55 PM
Thanks a lot Ryan
r

Ryan Peden

03/01/2023, 7:55 PM
You're welcome! šŸ˜„