@josh @Joe Schmid thanks a lot for sharing the advice.
No. We’re not using k8s but SLURM as a resource manager and job scheduler.
context: I (the company I am working for) am trying to integrate Prefect and SLURM (HPC cluster) via writing a custom SLURMExecutor class. The use case is machine learning and design inspiration about per-task containerization largely comes from Kubeflow. (so essentially letting k8s handler the task-level containerization as you suggested, but we don’t have k8s…)
use case:
task A -> needs only cpu. needs X, Y, Z python dependencies.
task B -> needs X number of GPUS. needs X, Y, Z python dependencies.
I am aware one path forward is to define an environment that’s superset of all those but due to various reasons, we think it’s not quite scalable.