Ofir
08/08/2023, 7:08 PMprefect-server
and a prefect-agent
.
What if 90% of the day the prefect-agent
(which is running on a GPU node on the cluster) is idle?
This means it’s underutilized and we waste money for no good reason.
Reference: Airflow provides Kubernetes Executor - on-demand/ad-hoc worker pods.
Since Prefect thought of everything - I’m sure there is either a built-in capability for that or a design pattern for achieving that.
Thanks!Henning Holgersen
08/08/2023, 7:31 PMOfir
08/08/2023, 7:37 PMHenning Holgersen
08/08/2023, 7:41 PMOfir
08/08/2023, 7:45 PMkeda.sh
auto-scaler and cheat by scaling up only the Prefect GPU worker node based on a trigger.keda.sh
auto-scaler endpoint and that would scale up the cluster and adding the GPU node that runs the Prefect GPU-intensive deployment.Henning Holgersen
08/08/2023, 7:50 PMOfir
08/08/2023, 7:50 PMHenning Holgersen
08/08/2023, 7:55 PMOfir
08/08/2023, 7:58 PMHenning Holgersen
08/08/2023, 8:02 PMOfir
08/08/2023, 8:05 PMHenning Holgersen
08/08/2023, 8:06 PMOfir
08/08/2023, 8:07 PMHenning Holgersen
08/08/2023, 8:08 PMOfir
08/08/2023, 8:09 PMHenning Holgersen
08/08/2023, 8:11 PMOfir
08/08/2023, 8:12 PMChristopher Boyd
08/09/2023, 6:35 PMOfir
08/09/2023, 6:39 PMChristopher Boyd
08/09/2023, 6:48 PMOfir
08/09/2023, 6:49 PMChristopher Boyd
08/09/2023, 7:02 PMOfir
08/09/2023, 10:06 PMChristopher Boyd
08/09/2023, 10:49 PMOfir
08/11/2023, 1:00 PMThe agent can run where ever, submitting a job to trigger a scale up is a submission to the kube apiI think that was missing for me, thanks for the clarification! Does it mean that my Prefect deployment now needs to submit a job and interact with the Kubernetes API server? How does the agent on nodepool A shich is not gpu based, passes the job to a node within nodepool B which IS GPU based?
Christopher Boyd
08/11/2023, 6:33 PMOfir
08/11/2023, 6:36 PMHenning Holgersen
08/11/2023, 6:42 PMOfir
08/11/2023, 6:44 PMChristopher Boyd
08/11/2023, 6:50 PM{
"kind": "Job",
"spec": {
"template": {
"spec": {
"containers": [
{
"env": [],
"name": "prefect-job"
}
],
"completions": 1,
"parallelism": 1,
"tolerations": [
{
"key": "prefect",
"value": f"{DEPLOY_TYPE}",
"effect": "NoSchedule",
"operator": "Equal"
}
],
"nodeSelector": {
"prefect": f"{DEPLOY_TYPE}"
},
"restartPolicy": "Never"
}
}
},
"metadata": {
"labels": {}
},
"apiVersion": "batch/v1"
}