I've just begun exploring prefect. I'm a bit unsur...
# prefect-getting-started
r
I've just begun exploring prefect. I'm a bit unsure of something with kubernetes though. I can get my flow to run on our kubernetes cluster, but I'd like to restrict the node that a pod runs in. I've had a look at, for example, how to customise the kubernetes job template (like here: https://discourse.prefect.io/t/creating-and-deploying-a-custom-kubernetes-infrastructure-block/1531) But from what I can gather, you set the node selection in a pod specification file, like described here: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/ Is it possible to customose the pod configuration so that I can select a node based on a label, for example?
j
@Christopher Boyd is the author of that Discourse article. This is a newer feature but I think you can use the
customizations
field to patch the Job: https://docs.prefect.io/api-ref/prefect/infrastructure/#prefect.infrastructure.KubernetesJob
r
I'll have a look at that. It does say "A list of JSON 6902 patches to apply to the base Job manifest." I'm not sure but I'm thinking I need to patch the pod manifest.
j
so you’d be patching the Job to change the template that it includes, and that template is what lands in the pod spec — so you’re patching the pod spec but not directly
c
I wrote some of that documentation - I think we have some examples on the KubernetesJob page I can share
r
oh yeah, I think you're right. https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-template I can add pod part into the job. So either in the customisation, or in the job template when setting up the kubernetesjob
c
https://docs.prefect.io/concepts/infrastructure/#kubernetesjob has both customizations and infra_overrides
❤️ 1
r
@Christopher Boyd that would be great
c
for the node specifically, you can pass in the NodeName or using a combination of taints and tolerations on your node + job spec
I dont have the time now, but I can look into adding more examples here for this, as I think it’s not the first time I’ve heard someone ask (about scheduling to a specific node / nodepool)
r
Ok, I'll have a play and if I get something I'll share
c
I think the path would be something like this, nodename falls at the same configuration level as containers
Copy code
[
    {
        "op": "add",
        "path": "/spec/template/spec/",
        "value": {
            "nodeName": "foo-node"
        },
    }
]
r
Ok, I think I've got it, the only part that confused me was indenation level, and you are correct, it is sxame level as containers. I couldn't get the override to work, instead I followed the instructions in the tutoral referenced above. In our example, we have some pods on premises, and some off prem. So it is nice if we can pick and choose. The 'on-prem' have this label, so I think it's as easy as creating your block with a default template like:
Copy code
apiVersion: batch/v1
kind: Job
metadata:
  # labels are required, even if empty
  labels: {}
spec:
  template:
    spec:
      completions: 1
      containers:  # the first container is required
      - env: []  # env is required, even if empty
        name: prefect-job
      parallelism: 1
      restartPolicy: Never
      nodeSelector:
        location: "on-prem"
Then, creating/modifying the block is easiest in python:
Copy code
from prefect.infrastructure.kubernetes import KubernetesJob

# some bits from existing block
k2block = KubernetesJob.load("kubetest2")

k8s_job=KubernetesJob(
    namespace="default",
    image=k2block.image,
    env=k2block.env,
    job=KubernetesJob.job_from_file("base_run.yaml")
)

k8s_job.save("k8sonprem", overwrite=True)