scott kim
05/02/2024, 8:24 AM<http://nvidia.com/gpu|nvidia.com/gpu>
in work_pool.resources
based on the value of parameters.args.gpu_count
in deployments
. How can I sovle this?
deployments:
- name: scott-test
version:
tags: []
description:
parameters:
args:
gpu_count: 1
epochs: 36
batch_size: 3
work_pool:
name: test-work-pool
work_queue_name:
job_variables:
node_selector:
<http://nvidia.com/gpu.product|nvidia.com/gpu.product>: NVIDIA-GeForce-RTX-3090
resources:
requests:
cpu: "2"
memory: "8Gi"
<http://nvidia.com/gpu|nvidia.com/gpu>:
limits:
cpu: "4"
memory: "16Gi"
<http://nvidia.com/gpu|nvidia.com/gpu>:
Marvin
05/02/2024, 8:24 AMwork_pool
based on the value of parameters.args.gpu_count
in deployments
, you can use placeholders that will be replaced with values at runtime on a per-deployment basis. The Prefect documentation outlines that variable placeholders can be defined in the base job template and can be customized for each work pool.
According to the Prefect GCP reference found here, placeholders in the form of {{ }}
are replaced with values defined by the variables schema in the job template.
For example, if you want to use the gpu_count
to set the <http://nvidia.com/gpu|nvidia.com/gpu>
resource request and limit, you would define a placeholder in the job template that corresponds to the gpu_count
parameter. When a job is executed, this placeholder would be replaced with the actual gpu_count
value provided in the deployment parameters.
Here's an illustrative extract from the documentation:
{
"apiVersion": "<http://run.googleapis.com/v1|run.googleapis.com/v1>",
"kind": "Job",
"metadata":
{
"name": "{{ name }}",
...
},
...
"annotations":
{
...
}
}
{{ }} is a placeholder that will be replaced with a value at runtime on a per-deployment basis. The values that can be used as placeholders are defined by the variables schema defined in the base job template.
Based on this, you should define the appropriate placeholders within your deployment definition's resources
section to link the gpu_count
to the GPU resource requests and limits. An example of how to use these placeholders isn't provided in the excerpts, but the concept should guide you to implement the solution within your job templates and work pool configuration. Remember to consult the current Prefect documentation or support forums specific to your Prefect version for more detailed instructions, as my knowledge might be outdated.scott kim
05/02/2024, 8:27 AMgpu_count
parameter value in <http://nvidia.com/gpu|nvidia.com/gpu>
in the prefect.yaml
I provided.Marvin
05/02/2024, 8:29 AMgpu_count
parameter value dynamically in the prefect.yaml
file for the <http://nvidia.com/gpu|nvidia.com/gpu>
resource, you would make use of the Prefect templating syntax within your YAML file to reference deployment parameters.
The relevant excerpt from the Prefect documentation on GUIDES - Prefect Deploy indicates how to achieve this but does not provide a direct example for the case of GPU resources. However, you can generally reference Prefect variables and use them in your deployments like so:
yaml
variables:
...
gpu_count: "{{ prefect.variables.gpu_count }}"
Applying a similar logic to your case might look like using deployment parameter substitution as follows:
yaml
deployments:
- name: scott-test
...
parameters:
args:
gpu_count: 1
...
work_pool:
...
job_variables:
...
resources:
requests:
...
<http://nvidia.com/gpu|nvidia.com/gpu>: "{{ prefect.variables.args.gpu_count }}"
limits:
...
<http://nvidia.com/gpu|nvidia.com/gpu>: "{{ prefect.variables.args.gpu_count }}"
However, the exact mechanism to reference the gpu_count
coming from parameters.args.gpu_count
in the job's templating system within the prefect.yaml
file isn't clearly outlined in the provided documentation excerpts.
For an exact implementation, you should check more specific documentation or resources that detail how to achieve parameter substitution in the prefect.yaml
file, particularly for deployment parameters. It could be necessary to reference the value directly or declare it as a variable elsewhere in your configuration. If the desired functionality does not exist or if the documentation is unclear, consider reaching out to Prefect support or their community forums for additional guidance. They could provide you with up-to-date information or an example specific to your Prefect version and requirements.scott kim
05/02/2024, 8:38 AMvariables:
gpu_count: "{{ prefect.variables.gpu_count }}"
deployments:
- name: scott-test
version:
tags: []
description:
parameters:
args:
gpu_count: 1
epochs: 36
batch_size: 3
work_pool:
name: test-work-pool
work_queue_name:
job_variables:
node_selector:
<http://nvidia.com/gpu.product|nvidia.com/gpu.product>: NVIDIA-GeForce-RTX-3090
resources:
requests:
cpu: "2"
memory: "8Gi"
<http://nvidia.com/gpu|nvidia.com/gpu>: "{{ prefect.variables.args.gpu_count }}"
limits:
cpu: "4"
memory: "16Gi"
<http://nvidia.com/gpu|nvidia.com/gpu>: "{{ prefect.variables.args.gpu_count }}"
Marvin
05/02/2024, 8:38 AM<http://nvidia.com/gpu|nvidia.com/gpu>
value based on parameters.args.gpu_count
in prefect.yaml
may not follow the syntax I suggested earlier.
The correct way to approach this would use a templating mechanism provided by Prefect for deployments, but the excerpts from the documentation do not offer a direct example specific to templating within the work_pool.resources
section of the prefect.yaml
file.
Here's what your attempt might conceptually look like, assuming Prefect's templating supported this directly:
yaml
deployments:
- name: scott-test
...
parameters:
args:
gpu_count: 1
...
work_pool:
name: test-work-pool
...
job_variables:
...
resources:
requests:
cpu: "2"
memory: "8Gi"
<http://nvidia.com/gpu|nvidia.com/gpu>: "{{ args.gpu_count }}"
limits:
cpu: "4"
memory: "16Gi"
<http://nvidia.com/gpu|nvidia.com/gpu>: "{{ args.gpu_count }}"
However, this assumes the templating syntax {{ args.gpu_count }}
would properly reference the parameters.args.gpu_count
value, which may not be accurate.
The Prefect documentation regarding the use of variables and placeholders does not explicitly outline this capability for the work_pool.resources
section in the prefect.yaml
, and my understanding is limited to information that is, unfortunately, likely to be outdated.
Please refer to the latest Prefect documentation specific to dynamic resource allocation and variable substitution within the prefect.yaml
configuration, or engage with the Prefect community for more timely assistance. The Prefect Community Slack or Prefect Discourse forums are excellent places to ask for help on these kinds of topics. They are far more up-to-date and may have the enthusiasm that I famously lack.