For a KubernetesRun flow, is there a way to set th...
# ask-community
r
For a KubernetesRun flow, is there a way to set the docker image to use as a parameter to the flow rather than hardcoding it? This might help with some chicken-and-egg type problems we’ve been having. (Basically we have a flow with dependencies complex enough to require a custom docker image, but that makes it painful for users to iterate on the flow as the image needs to be rebuilt each time - plus different users can effectively clobber each others’ images if the flow is set to use
latest
)
c
Hello Ryan. I was facing a similar problem as you. I've got a solution where: 1. Changes in external dependencies (e.g. pip and conda packages) --> rebuild image 2. Changes in user-defined packages (e.g. custom modules used in flows) are automatically pulled into the Kubernetes "Prefect flow" job upon creation The way this is achieved is 4 fold: 1. Bundle all your external dependencies into an image. Yes we are still using Docker images, but I assume that external dependencies do not change as often your own code. a. This image is specified in a custom
job_template.yaml
file. You specify the path to this file in the
job_template_path
arg in
KubernetesRun
. This path can point to an S3 object using the
s3:
prefix or to a path inside the
KubernetesAgent
using
agent:
prefix. 2. The custom
job_template.yaml
file is copied into the Kubernetes Agent's image. I took the Kubernetes Agent K8 config files from
prefect/server
Helm chart, then changed the image in the agent's
deployment.yaml
to point to this custom Prefect agent image. a. Don't forget to set up
image_pull_secrets
on your agent if you are pulling from a private registry! 3. Package all your custom Python code and non-python files using
setuptools
4. In
job_template.yaml
specify an "entrypoint" or ARG in K8 speak that pip installs your git package using
pip install git+https
EDIT: see end of thread for a simpler solution....
r
thanks Christopher, I’ll have a look at that
🙌 1
c
One important point, I'm using K8 secrets and configmap to pass secrets and environment variables into the Kubernetes "Prefect flow" job. You'll need a
GITHUB_ACCESS_TOKEN
environment variable in the Prefect job in order to pip install a package from a private repo.
👍 1
Hopefully I can finish off the README by the end of the week!
Feel free to reach out if you've got any questions or issues. I am also trying to improve on this setup
r
thanks!
c
Reading through your problem statement again, I think the best solution for your team is for each member to create a custom
job_template.yaml
which points to their OWN image, then specify the path to that file upon each flow run
👍 1
1
Because you mention "complex dependencies", I'm assuming then that the additional complexities of having an entrypoint that pip installs your Python package might not be worth the trouble
s
A little late here, but we are building out prefect such that each flow uses its own image (and we pass that image to the dask workers when setting up the executor as well). We accomplish this by having a build step in our CI that pushes to our docker registry, and then we pass those same environment variables into the env argument of the KubernetesRun. It’s a little clunky, but it is working so far. Christopher - I still have to use your job template thing, as that is something I haven’t figured out yet (we’re just using a custom one as part of a helper class).
c
Quick update on my side as well: I’m using S3 to store and load my job templates now (KubernetesRun supports this natively without any glue). Was a lot faster and easier than having multiple Agent images with just one job template per flow copied over
You need to authenticate to AWS on your agent though. I’m doing that with custom agent deployment.yaml with the AWS creds as environment variables (and those secrets stored in K8 secrets)
s
You may be able to use something like kube2iam which assigns iam roles based on annotations for that.