Alex Bussan
11/18/2020, 2:10 PMKubernetesRun
https://github.com/PrefectHQ/prefect/blob/master/src/prefect/run_configs/kubernetes.py
My goal is to be able to set a different job template on a "per Flow Run basis". It seems currently this is set up for a "per flow" basis since you have to do:
flow.run_config = KubernetesRun(...)
flow.register(project_name=project_name)
So it appears you have to re-register if you want different resource limits per flow run. Any ideas on how to best achieve this?Jim Crist-Harif
11/18/2020, 2:13 PMAlex Bussan
11/18/2020, 2:37 PMJim Crist-Harif
11/18/2020, 3:10 PMParameter
s?DaskExecutor
to scale up once the flow starts.
We might add something like what you're looking for in the future - just trying to better understand how you'd use this feature if we were to add it.Alex Bussan
11/18/2020, 3:56 PMmanually when kicking off the flow run (e.g. looking at the data, then entering some new numbers before kicking off a run)
Let's say a Parameter
is an s3 path to input data for the flow's processing task. This could point to data of widely varying size. So it'd be nice if the user passes in that path and then how much mem/CPU they want to limit it to for a run of that flow.
Currently creating a new flow for each different mem/CPU resource limit level is an OK solution I think
The reason we care about controlling resource limits is because we're going to have an underlying k8s cluster and a good number of people submitting Flows/Flow runs and we want people to be able to give their Flow runs enough muscle (their discretion) without eating up all the shared resources
I need to learn more about the DaskExecutor and dask in general but it seems to me a similar problem would happen - how can we prevent a teammate's flow run from eating up a huge amount of compute without specifying some limits on the flow run level. If someone submits a big input dataset on a flow and Dask lets it scale up uninhibited, and a lot of people do that our k8s cluster may get overwhelemed. While at the same time if we set a low limit on Dask then some of of the less frequent, more intensive flow runs won't work
An interesting thought a colleague pointed out - if it's called RunConfig you may initially think you could pass a RunConfig on the flow run level. Whereas really it's almost more of a FlowConfig because it sets the config for all runs in a flowJim Crist-Harif
11/18/2020, 7:35 PMKubernetesRun()
• start e.g. 3 agents, each with increasing default resource capacities
• at runtime, specify a label to select what agent you want the flow to be picked up by (say small
, medium
, large
)Alex Bussan
11/18/2020, 7:39 PMJim Crist-Harif
11/18/2020, 7:43 PMI need to learn more about the DaskExecutor and dask in general but it seems to me a similar problem would happen - how can we prevent a teammate's flow run from eating up a huge amount of compute without specifying some limits on the flow run level. If someone submits a big input dataset on a flow and Dask lets it scale up uninhibited, and a lot of people do that our k8s cluster may get overwhelemed. While at the same time if we set a low limit on Dask then some of of the less frequent, more intensive flow runs won't workWe generally recommend users run with adaptive scaling for dask, which has an optional upper bound. Dask is then free to scale up and down as needed, generally it should do a good job releasing unused resources. K8s also does a pretty good job sharing resources among a team. If you want to get fancy, you can also configure things so the dask workers run in an autoscaling node pool on preemptible nodes, so you can get extra capacity as needed but use cheaper nodes for that (assuming you're using some kind of cloud provided k8s).
Alex Bussan
11/18/2020, 7:56 PM