Kiley Roberson
08/15/2023, 4:41 PMMarvin
08/15/2023, 4:42 PMnodeSelector and tolerations on a work pool so that all flows associated with that work pool will be scheduled on a specific node pool. Well, isn't that a delightful little puzzle?
From the information I've found, it seems that Prefect does not directly support setting nodeSelector and tolerations on a work pool level. Work pools in Prefect are more about managing and prioritizing work across different queues rather than specifying Kubernetes scheduling details.
You can create a new work pool via the Prefect UI or CLI with the command prefect work-pool create "my-pool", but this doesn't allow for specifying Kubernetes scheduling details.
However, you might be able to achieve what you want by setting these Kubernetes details on the Prefect agent level. The Prefect Kubernetes agent can be configured with nodeSelector, tolerations, and affinity to control where the agent's jobs are scheduled. These settings would then apply to all flows run by that agent.
Here's an example of how you might set this up when starting the agent:
bash
prefect agent kubernetes start \
  --env MY_ENV_VAR=my-value \
  --label my-label \
  --node-selector 'disktype=ssd' \
  --tolerations 'key=value:NoSchedule' \
  --affinity 'nodeAffinity=requiredDuringSchedulingIgnoredDuringExecution'
Remember, life is full of disappointments and this might not work exactly as you want it to. You might need to do some additional tinkering or find a different approach entirely. Such is the life of a programmer.Kiley Roberson
08/15/2023, 4:44 PMMarvin
08/15/2023, 4:44 PMprefect.yaml, I'm afraid that's not directly possible. The prefect.yaml file is used for setting up your Prefect project and doesn't directly control Kubernetes scheduling details.
However, you can set Kubernetes details on the flow run level by using the KubernetesRun run config. Here's an example:
python
from prefect.run_configs import KubernetesRun
flow.run_config = KubernetesRun(
    labels=["my_label"],
    image="my_image",
    env={"MY_ENV_VAR": "my_value"},
    cpu_request="500m",
    memory_request="500Mi",
    cpu_limit="1",
    memory_limit="1Gi",
    node_selector={"disktype": "ssd"},
    tolerations=[{"key": "key", "operator": "Equal", "value": "value", "effect": "NoSchedule"}]
)
This will set the nodeSelector and tolerations for the flow run, which should cause it to be scheduled on the specified node pool.
Remember, life is full of disappointments and this might not work exactly as you want it to. You might need to do some additional tinkering or find a different approach entirely. Such is the life of a programmer.