Eric
10/27/2023, 7:01 PMprefect.yaml
file to create a deployment, in my Deployment's configuration tab I see:
{
...
"job_manifest": {
"spec": {
"template": {
"spec": {
"tolerations": [
{
"key": "dedicated",
"value": "asyncjobs",
"effect": "NoSchedule",
"operator": "Equal"
}
],
"nodeSelector": {
"kube/nodetype": "asyncjobs"
}
}
}
}
}
}
However the flow run pods created with this deployment don't have any of these values propagated. Not sure if I'm just setting this incorrectlyKevin Grismore
10/27/2023, 7:04 PMEric
10/27/2023, 7:05 PMKevin Grismore
10/27/2023, 7:06 PMEric
10/27/2023, 7:06 PMKevin Grismore
10/27/2023, 7:11 PM{
"variables": {
"type": "object",
"properties": {
"tolerations": {
"type": "array",
"title": "Tolerations"
},
"env": {
"type": "object",
"title": "Environment Variables",
"description": "Environment variables to set when starting a flow run.",
"additionalProperties": {
"type": "string"
}
},
"name": {
"type": "string",
"title": "Name",
"description": "Name given to infrastructure created by a worker."
},
...
"job_configuration": {
"env": "{{ env }}",
"name": "{{ name }}",
"labels": "{{ labels }}",
"command": "{{ command }}",
"namespace": "{{ namespace }}",
"job_manifest": {
"kind": "Job",
"spec": {
"template": {
"spec": {
"tolerations": "{{ tolerations }}",
"containers": [
{
"env": "{{ env }}",
"args": "{{ command }}",
"name": "prefect-job",
"image": "{{ image }}",
"imagePullPolicy": "{{ image_pull_policy }}"
}
],
"completions": 1,
"parallelism": 1,
"restartPolicy": "Never",
"serviceAccountName": "{{ service_account_name }}"
}
},
"backoffLimit": 0,
"ttlSecondsAfterFinished": "{{ finished_job_ttl }}"
},
"metadata": {
"labels": "{{ labels }}",
"namespace": "{{ namespace }}",
"generateName": "{{ name }}-"
},
"apiVersion": "batch/v1"
},
"stream_output": "{{ stream_output }}",
"cluster_config": "{{ cluster_config }}",
"job_watch_timeout_seconds": "{{ job_watch_timeout_seconds }}",
"pod_watch_timeout_seconds": "{{ pod_watch_timeout_seconds }}"
}
}
Kevin Grismore
10/27/2023, 7:12 PMtolerations
variable that will appear in the UI as Tolerations, and a place in the template to override from my deployment in "{{ tolerations }}"
Eric
10/27/2023, 7:12 PMprefect
file to propagate these configurations?Jamie Zieziula
10/27/2023, 7:13 PMJamie Zieziula
10/27/2023, 7:14 PMKevin Grismore
10/27/2023, 7:14 PMEric
10/27/2023, 7:15 PMEric
10/27/2023, 7:15 PMKevin Grismore
10/27/2023, 7:18 PMdeployments:
- name: demo
version: null
tags: []
description: null
schedule: {}
flow_name: null
entrypoint: flow.py:hello
parameters: {}
work_pool:
name: k8s-demo
work_queue_name: null
job_variables:
tolerations:
- key: dedicated
value: asyncjobs
effect: NoSchedule
operator: Equal
Jamie Zieziula
10/27/2023, 7:19 PMEric
10/27/2023, 7:20 PMEric
10/27/2023, 7:21 PMKevin Grismore
10/27/2023, 7:22 PMEric
10/27/2023, 7:23 PMKevin Grismore
10/27/2023, 7:24 PMEric
10/27/2023, 7:25 PMEric
10/27/2023, 10:33 PM"tolerations": "{{ tolerations }}",
"nodeSelector": "{{ node_selector }}",
to my work-pool base json, but I keep getting an error from the UI saying it failed to udpate the work-pool but doesn't say whyEric
10/27/2023, 10:33 PMThe variables specified in the job configuration template must be present as properties in the variables schema. Your job configuration uses the following undeclared variable(s): node_selector ,tolerations.
Eric
10/27/2023, 10:35 PMKevin Grismore
10/27/2023, 10:35 PMEric
10/27/2023, 10:40 PMEric
10/27/2023, 10:51 PM? Would you like to build a custom Docker image for this deployment? [y/n] (n):
Traceback (most recent call last):
File "/Users/erickim/venv/inari_py/lib/python3.9/site-packages/prefect/cli/_utilities.py", line 41, in wrapper
return fn(*args, **kwargs)
File "/Users/erickim/venv/inari_py/lib/python3.9/site-packages/prefect/utilities/asyncutils.py", line 255, in coroutine_wrapper
return call()
File "/Users/erickim/venv/inari_py/lib/python3.9/site-packages/prefect/_internal/concurrency/calls.py", line 382, in __call__
return self.result()
File "/Users/erickim/venv/inari_py/lib/python3.9/site-packages/prefect/_internal/concurrency/calls.py", line 282, in result
return self.future.result(timeout=timeout)
File "/Users/erickim/venv/inari_py/lib/python3.9/site-packages/prefect/_internal/concurrency/calls.py", line 168, in result
return self.__get_result()
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
raise self._exception
File "/Users/erickim/venv/inari_py/lib/python3.9/site-packages/prefect/_internal/concurrency/calls.py", line 345, in _run_async
result = await coro
File "/Users/erickim/venv/inari_py/lib/python3.9/site-packages/prefect/cli/deploy.py", line 249, in deploy
await _run_single_deploy(
File "/Users/erickim/venv/inari_py/lib/python3.9/site-packages/prefect/client/utilities.py", line 51, in with_injected_client
return await fn(*args, **kwargs)
File "/Users/erickim/venv/inari_py/lib/python3.9/site-packages/prefect/cli/deploy.py", line 550, in _run_single_deploy
deployment_id = await client.create_deployment(
File "/Users/erickim/venv/inari_py/lib/python3.9/site-packages/prefect/client/orchestration.py", line 1479, in create_deployment
response = await <http://self._client.post|self._client.post>(
File "/Users/erickim/venv/inari_py/lib/python3.9/site-packages/httpx/_client.py", line 1848, in post
return await self.request(
File "/Users/erickim/venv/inari_py/lib/python3.9/site-packages/httpx/_client.py", line 1530, in request
return await self.send(request, auth=auth, follow_redirects=follow_redirects)
File "/Users/erickim/venv/inari_py/lib/python3.9/site-packages/prefect/client/base.py", line 285, in send
response.raise_for_status()
File "/Users/erickim/venv/inari_py/lib/python3.9/site-packages/prefect/client/base.py", line 138, in raise_for_status
raise PrefectHTTPStatusError.from_httpx_error(exc) from exc.__cause__
prefect.exceptions.PrefectHTTPStatusError: Server error '500 Internal Server Error' for url
Eric
10/27/2023, 10:58 PMKrystal
01/08/2024, 12:26 PM"tolerations": "{{ tolerations }}",
and my prefect.yaml file looks like this
deployments:
- name: "testing"
schedule: null
entrypoint: "flows/test.py:test"
work_pool:
name: "gpu-work-pool"
job_variables:
image: xxx
tolerations:
- effect: NoSchedule
key: gpu
operator: Exists
nodeSelector:
karpenter.sh/provisioner-name: gpu
However, when I do a deploy, I am constantly getting an error of
Response: {'detail': 'Error creating deployment: <ValidationError: "[{\'effect\': \'NoSchedule\', \'key\': \'gpu\', \'operator\': \'Exists\'}] is not of type \'object\'">'}
How can I resolve this please? @Kevin GrismoreKevin Grismore
01/08/2024, 12:56 PMarray
Krystal
01/08/2024, 1:32 PMKevin Grismore
01/08/2024, 1:41 PMKrystal
01/08/2024, 1:42 PM