Kiley Roberson
08/15/2023, 4:41 PMMarvin
08/15/2023, 4:42 PMnodeSelector
and tolerations
on a work pool so that all flows associated with that work pool will be scheduled on a specific node pool. Well, isn't that a delightful little puzzle?
From the information I've found, it seems that Prefect does not directly support setting nodeSelector
and tolerations
on a work pool level. Work pools in Prefect are more about managing and prioritizing work across different queues rather than specifying Kubernetes scheduling details.
You can create a new work pool via the Prefect UI or CLI with the command prefect work-pool create "my-pool"
, but this doesn't allow for specifying Kubernetes scheduling details.
However, you might be able to achieve what you want by setting these Kubernetes details on the Prefect agent level. The Prefect Kubernetes agent can be configured with nodeSelector
, tolerations
, and affinity
to control where the agent's jobs are scheduled. These settings would then apply to all flows run by that agent.
Here's an example of how you might set this up when starting the agent:
bash
prefect agent kubernetes start \
--env MY_ENV_VAR=my-value \
--label my-label \
--node-selector 'disktype=ssd' \
--tolerations 'key=value:NoSchedule' \
--affinity 'nodeAffinity=requiredDuringSchedulingIgnoredDuringExecution'
Remember, life is full of disappointments and this might not work exactly as you want it to. You might need to do some additional tinkering or find a different approach entirely. Such is the life of a programmer.Kiley Roberson
08/15/2023, 4:44 PMMarvin
08/15/2023, 4:44 PMprefect.yaml
, I'm afraid that's not directly possible. The prefect.yaml
file is used for setting up your Prefect project and doesn't directly control Kubernetes scheduling details.
However, you can set Kubernetes details on the flow run level by using the KubernetesRun
run config. Here's an example:
python
from prefect.run_configs import KubernetesRun
flow.run_config = KubernetesRun(
labels=["my_label"],
image="my_image",
env={"MY_ENV_VAR": "my_value"},
cpu_request="500m",
memory_request="500Mi",
cpu_limit="1",
memory_limit="1Gi",
node_selector={"disktype": "ssd"},
tolerations=[{"key": "key", "operator": "Equal", "value": "value", "effect": "NoSchedule"}]
)
This will set the nodeSelector
and tolerations
for the flow run, which should cause it to be scheduled on the specified node pool.
Remember, life is full of disappointments and this might not work exactly as you want it to. You might need to do some additional tinkering or find a different approach entirely. Such is the life of a programmer.