I am deploying a kube worker using the documentation found h Prefect Community #ask-community

I am deploying a kube worker using the documentati...

Eric

10/27/2023, 4:21 PM

I am deploying a kube worker using the documentation found here: https://docs.prefect.io/2.11.5/guides/deployment/kubernetes/ How can I set the taint / nodeSelector values while running the helm install command? Can it be overriden in the values.yaml config file?

Nate

10/27/2023, 4:28 PM

hi @Eric - everything you see in here can be overridden in the values.yaml

👍 1

Eric

10/27/2023, 4:29 PM

Thanks! I also realized I can just extract the values.yaml file so I checked against that

👍 1

Eric

10/27/2023, 5:57 PM

@Nate what is the best way to set this for the flow run pods? My worker pod was deployed fine, but I'm not sure where to set it for the flow runs

Nate

10/27/2023, 5:58 PM

https://prefect-community.slack.com/archives/C048SVCEFF0/p1698141874902759

Eric

10/27/2023, 6:07 PM

Hm the solutions posted on that thread don't show where to set toleration / nodeSelector values. Are there any examples?

Nate

10/27/2023, 6:25 PM

the point is that you can alter the spec however you want - there's probably not many

prefect.yaml

examples of people altering the spec in the exact way you're planning on, but it shouldnt be meaningfully different from the examples in that other thread in terms of how to accomplish a spec override relevant k8s docs

Eric

10/27/2023, 6:26 PM

In my prefect.yaml deployment file, i have this

Copy code

deployments:
- name: <>
  tags: *common_tags
  schedule: null
  entrypoint: <>
  work_pool: *common_work_pool
  run_config:
    type: "kubernetes"
    job_template_path: "./kube_prefect_job_template.yaml"

And my

kube_prefect_job_template.yaml

looks like

Copy code

apiVersion: batch/v1
kind: Job
metadata:
  generateName: flow-run-
spec:
  template:
    spec:
      nodeSelector:
        "kube/nodetype": "asyncjobs"
      tolerations:
        - key: "dedicated"
          operator: "Equal"
          value: "asyncjobs"
          effect: "NoSchedule"

But these values are not getting picked up by the new pods; im not sure where prefect holds these values

Nate

10/27/2023, 6:28 PM

when you create your deployment and go to the configuration tab on the deployment of interest, do you see any of your spec changes? i am not familiar with the run_config / job_template path syntax you're using

Eric

10/27/2023, 6:30 PM

Nope; I'm probably not setting the values using the right fields. Wondering if there are examples for this

Nate

10/27/2023, 6:33 PM

i would think you just want

Copy code

deployments:
- name: <>
  tags: *common_tags
  schedule: null
  entrypoint: <>
  work_pool: *common_work_pool
  job_variables:
     job_manifest:
        spec:
           containers:
           # whatever you want to do to the spec

Eric

10/27/2023, 6:33 PM

I copied the spec example from the thread and I see those values in the deployment configuration tab now, but the pod is still not picking those configs up

Eric

10/27/2023, 6:37 PM

Yeap, so I did that and I'm seeing it reflected in the PrefectUI

Copy code

{
  "env": {
<>
  },
  "image": <>
  "job_manifest": {
    "spec": {
      "tolerations": [
        {
          "key": "dedicated",
          "value": "asyncjobs",
          "effect": "NoSchedule",
          "operator": "Equal"
        }
      ],
      "nodeSelector": {
        "kube/nodetype": "asyncjobs"
      }
    }
  }
}

But the jobs created still don't have those values propagated

Nate

10/27/2023, 6:38 PM

i would want to

k describe your-pod

and see whats going on there

Eric

10/27/2023, 6:42 PM

Seeing

Copy code

Node-Selectors:              <none>
Tolerations:                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  21s   default-scheduler  0/4 nodes are available: 2 node(s) had untolerated taint {dedicated: asyncjobs}, 2 node(s) had untolerated taint {dedicated: webapp}. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling..

You can see node selectors and tolerations are not being propagated from the deployment job arguments

Eric

10/27/2023, 6:55 PM

In the deployment configuration, should it look like this

Copy code

"job_manifest": {
    "spec": {
      "template": {
        "spec": {
          "tolerations": [
            {
              "key": "dedicated",
              "value": "asyncjobs",
              "effect": "NoSchedule",
              "operator": "Equal"
            }
          ],
          "nodeSelector": {
            "kube/nodetype": "asyncjobs"
          }
        }
      }
    }
  }

Eric

10/27/2023, 6:57 PM

Kind of hitting a wall here

Jamie Zieziula

10/27/2023, 7:04 PM

Hi @Eric what do you want to be affected by the taint/nodeSelectors? the

prefect worker

deployment or the

flow

pods?

Eric

10/27/2023, 7:06 PM

The flow pods

Eric

10/27/2023, 7:06 PM

I was able to deploy the prefect worker fine

Nate

10/27/2023, 7:08 PM

ICYMI in that thread i linked earlier i suggest setting the spec on the advanced tab of the work pool itself, so that each deployment flow run's pod inherits it- instead of putting it in the

prefect.yaml

(which overrides it for each deployment)

Jamie Zieziula

10/27/2023, 7:08 PM

can you describe the job created in the cluster and share the output here?

Eric

10/27/2023, 7:11 PM

@Nate just making sure I'm not misreading this:

Copy code

you can set resource requests on your k8s work pool as a default in the advanced tab in the UI (picture attached) or you could override those defaults for a given deployment in your prefect.yaml like

I read this, and added it to my deployment

prefect.yaml

file

Jamie Zieziula

10/27/2023, 7:13 PM

for testing - can you provide the configuration as a part of the advanced tab under the workpool? like you shared above ^

Copy code

"job_manifest": {
    "spec": {
      "template": {
        "spec": {
          "tolerations": [
            {
              "key": "dedicated",
              "value": "asyncjobs",
              "effect": "NoSchedule",
              "operator": "Equal"
            }
          ],
          "nodeSelector": {
            "kube/nodetype": "asyncjobs"
          }
        }
      }
    }
  }

Nate

10/27/2023, 7:13 PM

I read this, and added it to my deployment
prefect.yaml
file

that's one option, but by putting that config in your

prefect.yaml

you're saying "i want this job spec for this specific deployment" not "all flow runs from the work pool" if you want this spec for all flow runs from this deployment, you should put it directly on the work pool

Eric

10/27/2023, 7:14 PM

But its defined in my work pool definition:

Copy code

definitions:
  tags: &common_tags
    - "feedback-insight"
    - "dev"
  work_pool: &common_work_pool
    name: $PREFECT_WORK_POOL_NAME
    job_variables:
      image: $BACKEND_IMAGE_URI
      env:
        <>
      job_manifest:
        spec:
          template:
            spec:
              tolerations:
              - key: "dedicated"
                operator: "Equal"
                value: "asyncjobs"
                effect: "NoSchedule"
              nodeSelector:
                "kube/nodetype": "asyncjobs"

# the deployments section allows you to provide configuration for deploying flows
deployments:
- name: "parse-feedback-dev"
  tags: *common_tags
  schedule: null
  entrypoint: "inari_app/processing/feedback/feedback_parse_prefect.py:parse_feedback_and_highlights"
  work_pool: *common_work_pool

Nate

10/27/2023, 7:14 PM

that is not your work pool definition, that is an override of the default values that already exist on your work pool

Eric

10/27/2023, 7:15 PM

Ok. But as long as I am creating flow runs using this deployment, the job_manifest should be overriden right?

Jamie Zieziula

10/27/2023, 7:15 PM

this conversation is happening across 2 different threads. may i suggest we consolidate on this one

👍 1

Open in Slack

Previous Next