https://prefect.io logo
b

Ben Epstein

02/09/2021, 10:34 PM
I’m having trouble finding this in the docs - is there a way to specify package dependencies for a run? I see you can use a kubernetesExecutor like airflow and specify a docker image, but that would involve rebuilding your image every time dependencies change. Is there a way to include a
requirements.txt
or
conda.yaml
with a job? Thanks!
I see this
python_dependencies
option here but this seems like it’s for a custom use case https://docs.prefect.io/orchestration/recipes/configuring_storage.html#building-a-custom-base-image
j

Jim Crist-Harif

02/09/2021, 10:54 PM
Hi Ben, There's no
KubernetesExecutor
, I think you're referring to a
KubernetesRun
run config? This allows you to specify an image to use for a flow run. You only need to rebuild your image if any of your dependent libraries (things you import in your flow script) change. If only your flow script changes you only need to update the script wherever it's stored (e.g. github, s3, ...). If you're using docker storage though, a docker image is all you have, so that would be rebuilding an image for every change.
b

Ben Epstein

02/09/2021, 10:55 PM
Thanks, Jim! Right, I’m talking about the former, when python library dependencies change
j

Jim Crist-Harif

02/09/2021, 10:56 PM
Is there a way to include a
requirements.txt
or
conda.yaml
with a job?
What's the desired behavior here? Would these dependencies be installed at flow run time? That takes time, and may result in different dependencies for different runs if you aren't fully pinning the dependencies. This may be fine, but we usually encourage building an image with everything required (so you only pay the download + install cost once).
If you're developing/experimenting though, or only need to make a small change, our provided
prefecthq/prefect
image includes support for an
EXTRA_PIP_PACKAGES
environment variable. If defined, this is executed as
pip install $EXTRA_PIP_PACKAGES
before a flow run starts up. Useful for experimenting, but we'd encourage any production deployment to freeze their environment at image build time rather than installing at runtime.
🙌 1
upvote 2
b

Ben Epstein

02/09/2021, 10:58 PM
Yes, that is exactly the use case, installing dependencies at run time. Definitely understand the payoffs, but I’m thinking about quick testing and development (ie before productionizing it and creating a docker image with final python deps). How would I go about doing that? Would it be with something like the shell task?
1
Oh, awesome, that’s even better, thank you!
j

Jim Crist-Harif

02/09/2021, 10:58 PM
No problem!
Example (with a
DockerRun
)
Copy code
flow.run_config = DockerRun(env={"EXTRA_PIP_PACKAGES": "numpy pandas"})
b

Ben Epstein

02/09/2021, 11:00 PM
would
Copy code
flow.run_config = DockerRun(env={"EXTRA_PIP_PACKAGES": "numpy==1.20.1 pandas>=1.2.2"})
work?
j

Jim Crist-Harif

02/09/2021, 11:00 PM
yep, anything that's valid after
pip install ...
works.
b

Ben Epstein

02/09/2021, 11:00 PM
Also, is there a reference to that in the docs? I can’t seem to find it
j

Jim Crist-Harif

02/09/2021, 11:00 PM
There is not
TODO on my part
b

Ben Epstein

02/09/2021, 11:01 PM
Ah, okay no worries, I’ll pin this thread. Thanks for your help, this is exactly what I was looking for
👍 1
2 Views