Ying Ting Loo
10/24/2023, 10:04 AMspec:
containers:
- name: nvidia-smi
image: "nvidia/cuda:11.8.0-runtime-centos7"
args:
- "nvidia-smi"
resources:
limits:
<http://nvidia.com/gpu|nvidia.com/gpu>: "1"
tolerations:
- key: "<http://nvidia.com/gpu|nvidia.com/gpu>"
operator: "Exists"
effect: "NoSchedule"
but if i do this on the prefect-worker deployment.yaml file would it actually do anything to the flow pod when they are launched?Nate
10/24/2023, 7:34 PMdeployments:
- name: healthcheck-storage-test
entrypoint: src/demo_project/healthcheck.py:healthcheck
work_pool:
name: k8s
work_queue_name:
job_variables:
env:
PREFECT_DEFAULT_RESULT_STORAGE_BLOCK: s3/flow-script-storage-main
job_manifest:
spec:
containers:
resources:
limits:
<http://nvidia.com/gpu|nvidia.com/gpu>: "1"
where I'll highlight the main difference here:
your example is changing the resource request for the pod that runs the actual worker (not where the flow run executes), where my example is changing the spec of the job that this worker creates for the flow run that it submitsNate
10/24/2023, 7:34 PMprefect deploy -n healthcheck-storage-test
then in the UI I see this on that deployment's configuration tabYing Ting Loo
10/25/2023, 8:58 AMThe node was low on resource: ephemeral-storage. Threshold quantity: 3210844697, available: 2000796Ki. Container linkerd-proxy was using 4Ki, request is 0, has larger consumption of ephemeral-storage. Container prefect-job was using 108Ki, request is 0, has larger consumption of ephemeral-storage.
Nate
10/27/2023, 6:00 PM