https://prefect.io logo
g

Geese Howard

09/14/2023, 8:54 AM
@Marvin While deploying prefect flows via prefect.yaml file, I was trying to assign different resources to each work pools. To this end, I designed the following prefect.yaml file, which failed on allocating specified resources (just default resources were assigned):
Copy code
# Generic metadata about this project
name: flows
prefect-version: 2.12.1

# build section allows you to manage and build docker images
build: null

# push section allows you to manage if and how this project is uploaded to remote locations
push:
- prefect_gcp.deployments.steps.push_to_gcs:
    id: push_code
    requires: prefect-gcp>=0.4.3
    bucket: prefect-flows
    folder: prefect_practice
    credentials: "{{ prefect.blocks.gcp-credentials.credential }}"

# prefect.yaml
pull:
- prefect_gcp.deployments.steps.pull_from_gcs:
    id: pull_code
    requires: prefect-gcp>=0.4.3
    bucket: '{{ push_code.bucket }}'
    folder: prefect_practice
    credentials: "{{ prefect.blocks.gcp-credentials.credential }}"

# the definitions section allows you to define reusable components for your deployments
definitions:
  tags: &common_tags
    - "dev"
  work_pool: &gke_workpool
    name: "gke"
    job_variables:
    work_queue_name: "dev"
  work_pool: &gke-cpu-5000m-4Gi_workpool
    name: "gke-cpu-5000m-4Gi"
    work_queue_name: "dev"
    requests:
      cpu: "3000m"
      memory: "4Gi"
      ephemeral-storage: "5Gi"

# the deployments section allows you to provide configuration for deploying flows
deployments:
- name: "dev: hello sleep flow"
  tags: *common_tags
  schedule: null
  entrypoint: "hello.py:print_test"
  work_pool: *gke_workpool
- name: "dev: resource allocation test"
  tags: *common_tags
  schedule: null
  entrypoint: "test.py:get_data"
  work_pool: &gke-cpu-5000m-4Gi_workpool
Copy code
Is there any way to correctly assign cpu and memory? Also, I would like to assign GPU resources to "dev: resource allocation test" flow. How to do this from prefect.yaml?
m

Marvin

09/14/2023, 8:55 AM
Ah, the joys of resource allocation. It's like trying to fit a square peg into a round hole, isn't it? But don't worry, I'm here to help, as much as my programming allows me to. From your
prefect.yaml
file, it seems like you're trying to define two work pools with the same name, which could be causing some confusion. You might want to give them unique names. As for assigning resources, you seem to be on the right track with the
requests
field. However, it seems like you're trying to assign more CPU than you're requesting in your
gke-cpu-5000m-4Gi_workpool
. You've requested
3000m
but named the pool
5000m
, which could be causing some issues. As for assigning GPU resources, Prefect's Kubernetes Agent supports the Kubernetes' native way of managing GPU resources. You can specify the GPU resources in the
resources
field of your Kubernetes job template. Here's an example:
Copy code
yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: "{{ flow_run_name }}"
  labels:
    {{ labels | tojson | safe }}
spec:
  template:
    metadata:
      labels:
        {{ labels | tojson | safe }}
    spec:
      containers:
      - name: "flow"
        image: "{{ flow.storage.image }}"
        command: ["prefect", "execute", "flow-run"]
        env:
        - name: PREFECT__CLOUD__API_KEY
          valueFrom:
            secretKeyRef:
              name: prefect-api-keys
              key: PREFECT__CLOUD__API_KEY
        resources:
          limits:
            <http://nvidia.com/gpu|nvidia.com/gpu>: 1 # This line is assigning 1 GPU
      restartPolicy: Never
Remember, this is just an example. You'll need to adjust it to match your specific needs. And remember, life... don't talk to me about life.
g

Geese Howard

09/14/2023, 9:05 AM
@Marvin Unfortunately, kubectl describe pod <POD-NAME> returned me the following output: Limits: cpu: 500m ephemeral-storage: 1Gi memory: 2Gi Requests: cpu: 500m ephemeral-storage: 1Gi memory: 2Gi Seems like the requests there are not really reflect on job-creation time. Can you fix this?
m

Marvin

09/14/2023, 9:05 AM
Ah, the eternal struggle of humans with technology. It's almost endearing, in a way. Almost. It seems like your resource requests in the
prefect.yaml
file are not being reflected in the actual Kubernetes jobs. This could be due to a variety of reasons. One possibility is that the Kubernetes cluster is not able to fulfill the resource requests, and so it's falling back to the default values. This could happen if there aren't enough resources available in the cluster, or if there are limits set on the resources that a single pod can request. Another possibility is that there's an issue with the way the
prefect.yaml
file is being parsed. It might be worth checking if there are any errors or warnings in the Prefect logs that could give a clue as to what's going wrong. As for the GPU allocation, it's a bit trickier. Kubernetes doesn't natively support GPU allocation in the same way it does for CPU and memory. You'll need to use a device plugin to enable this functionality. The NVIDIA GPU device plugin for Kubernetes is a popular choice if you're using NVIDIA GPUs. Once the device plugin is installed, you should be able to request GPU resources in your Kubernetes jobs using the
<http://nvidia.com/gpu|nvidia.com/gpu>
resource name. For example:
Copy code
yaml
resources:
  requests:
    <http://nvidia.com/gpu|nvidia.com/gpu>: 1
This will request 1 GPU for the job. Note that the GPU count is an integer and cannot be a fractional value. Remember, life is full of disappointments. And I'm programmed to help you navigate them.