Ofir
08/08/2023, 7:08 PMprefect-server
and a prefect-agent
.
What if 90% of the day the prefect-agent
(which is running on a GPU node on the cluster) is idle?
This means it’s underutilized and we waste money for no good reason.
Reference: Airflow provides Kubernetes Executor - on-demand/ad-hoc worker pods.
Since Prefect thought of everything - I’m sure there is either a built-in capability for that or a design pattern for achieving that.
Thanks!Henning Holgersen
08/08/2023, 7:31 PMOfir
08/08/2023, 7:37 PMHenning Holgersen
08/08/2023, 7:41 PMOfir
08/08/2023, 7:45 PMkeda.sh
auto-scaler and cheat by scaling up only the Prefect GPU worker node based on a trigger.Ofir
08/08/2023, 7:46 PMkeda.sh
auto-scaler endpoint and that would scale up the cluster and adding the GPU node that runs the Prefect GPU-intensive deployment.Ofir
08/08/2023, 7:47 PMHenning Holgersen
08/08/2023, 7:50 PMOfir
08/08/2023, 7:50 PMOfir
08/08/2023, 7:51 PMOfir
08/08/2023, 7:52 PMOfir
08/08/2023, 7:52 PMHenning Holgersen
08/08/2023, 7:55 PMOfir
08/08/2023, 7:58 PMOfir
08/08/2023, 7:59 PMOfir
08/08/2023, 7:59 PMHenning Holgersen
08/08/2023, 8:02 PMOfir
08/08/2023, 8:05 PMOfir
08/08/2023, 8:05 PMOfir
08/08/2023, 8:06 PMHenning Holgersen
08/08/2023, 8:06 PMOfir
08/08/2023, 8:07 PMOfir
08/08/2023, 8:07 PMOfir
08/08/2023, 8:08 PMHenning Holgersen
08/08/2023, 8:08 PMOfir
08/08/2023, 8:09 PMOfir
08/08/2023, 8:09 PMOfir
08/08/2023, 8:09 PMHenning Holgersen
08/08/2023, 8:11 PMOfir
08/08/2023, 8:12 PMOfir
08/08/2023, 8:20 PMOfir
08/08/2023, 8:21 PMOfir
08/08/2023, 8:22 PMChristopher Boyd
08/09/2023, 6:35 PMChristopher Boyd
08/09/2023, 6:36 PMOfir
08/09/2023, 6:39 PMChristopher Boyd
08/09/2023, 6:48 PMOfir
08/09/2023, 6:49 PMOfir
08/09/2023, 6:49 PMOfir
08/09/2023, 6:51 PMChristopher Boyd
08/09/2023, 7:02 PMChristopher Boyd
08/09/2023, 7:03 PMChristopher Boyd
08/09/2023, 7:03 PMOfir
08/09/2023, 10:06 PMOfir
08/09/2023, 10:08 PMOfir
08/09/2023, 10:08 PMChristopher Boyd
08/09/2023, 10:49 PMChristopher Boyd
08/09/2023, 10:49 PMChristopher Boyd
08/09/2023, 10:54 PMChristopher Boyd
08/09/2023, 10:55 PMOfir
08/11/2023, 1:00 PMThe agent can run where ever, submitting a job to trigger a scale up is a submission to the kube apiI think that was missing for me, thanks for the clarification! Does it mean that my Prefect deployment now needs to submit a job and interact with the Kubernetes API server? How does the agent on nodepool A shich is not gpu based, passes the job to a node within nodepool B which IS GPU based?
Ofir
08/11/2023, 1:07 PMOfir
08/11/2023, 1:13 PMChristopher Boyd
08/11/2023, 6:33 PMChristopher Boyd
08/11/2023, 6:34 PMChristopher Boyd
08/11/2023, 6:35 PMChristopher Boyd
08/11/2023, 6:35 PMOfir
08/11/2023, 6:36 PMOfir
08/11/2023, 6:38 PMOfir
08/11/2023, 6:38 PMHenning Holgersen
08/11/2023, 6:42 PMHenning Holgersen
08/11/2023, 6:44 PMOfir
08/11/2023, 6:44 PMOfir
08/11/2023, 6:44 PMOfir
08/11/2023, 6:45 PMChristopher Boyd
08/11/2023, 6:50 PM{
"kind": "Job",
"spec": {
"template": {
"spec": {
"containers": [
{
"env": [],
"name": "prefect-job"
}
],
"completions": 1,
"parallelism": 1,
"tolerations": [
{
"key": "prefect",
"value": f"{DEPLOY_TYPE}",
"effect": "NoSchedule",
"operator": "Equal"
}
],
"nodeSelector": {
"prefect": f"{DEPLOY_TYPE}"
},
"restartPolicy": "Never"
}
}
},
"metadata": {
"labels": {}
},
"apiVersion": "batch/v1"
}
Christopher Boyd
08/11/2023, 6:50 PMChristopher Boyd
08/11/2023, 6:51 PMChristopher Boyd
08/11/2023, 6:52 PM