Chu Lục Ninh
03/29/2022, 1:09 PMRunKubernetesJob
. But everytime I run the flow using KunernetesAgent
, the agent always spin up new job, which in my case is useless and waste of resource.
Since the flow is mainly about spin up new k8s job, I want KubernetesAgent
run it directly.
Please advise me the way to do that.Kevin Kho
RunNamespacedJob
is really intended to spin up a new pod. If you have jobs in other languages, you can try calling them directly inside your Flow with the ShellTask
shell = ShellTask()
with Flow(...) as flow:
shell(...)
so you can call them through the command line.
I think running them all on the agent is less efficient right? Because that means your agent pod needs all of the resources to run the batch jobs. If you leave the agent lightweight, then you can just create a pod to run the Flow. And then instead of RunNamespacedJob
, just use the ShellTask
to invoke those all those programs inside the container.Chu Lục Ninh
03/29/2022, 1:53 PMShellTask
doesn't help much. 100% tasks in my flow is RunKubernetesJob
, that's why I believe the agent will have no workload at all, and spin up kuber job to just run that kind of flow is wasting resourcesKevin Kho
RunKubernetesJob
, are you referring to KubernetesRun
or RunNamespacedJob
? Could you tell me why the ShellTask
doesnt help?Chu Lục Ninh
03/29/2022, 1:56 PMflow = Flow("test", tasks=[RunKubernetesJob(), RunKubernetesJob(), RunKubernetesJob()], run_config=KubernetesRun())
Kevin Kho
Chu Lục Ninh
03/29/2022, 2:04 PMKevin Kho
Chu Lục Ninh
03/29/2022, 2:15 PMAnna Geller
deploy_flow
. This method decides how the compute infrastructure for the flow run should be deployed.
Then, when it comes to where and how your task runs get executed, this is what executor decides. If you use the default LocalExecutor
, then all your task runs are running within the same execution environment as the flow run, here the flow run pod deployed as a Kubernetes job. If you would use e.g. a DaskExecutor
, then your task runs would be shipped to Dask workers for execution. When running dask on Kubernetes, you could e.g. use KubeCluster
class to spin up a Dask cluster on Kubernetes.
In your use case, each of your flow runs gets deployed as Kubernetes job and since you designed your tasks to run as separate Kubernetes jobs via a Kubernetes task, each of your task also gets deployed into a separate pod. But there is no way around still having a flow run pod - this is an entirely separate concern as what your tasks are doing. Your task could run on Databricks if you wish, or could execute some in-warehouse SQL transformation, but Prefect flow run process needs its own process (a subprocess, a Docker container, a Kubernetes job, an ECS task)Chu Lục Ninh
03/29/2022, 2:50 PMKubernetesAgent
but can popen
to execute flow in sub-process like LocalAgent?KubernetesAgent
so it run flow directly in subprocess instead of spawn new pod for the flowAnna Geller
I just want to customizeIf that’s the case, you should useso it run flow directly in subprocess instead of spawn new pod for the flowKubernetesAgent
ShellTask
rather than RunKubernetesJob
. ShellTask
creates a subprocess and runs some custom Linux shell command within that subprocess.Chu Lục Ninh
03/29/2022, 3:00 PMAgent
right?Anna Geller
Chu Lục Ninh
03/29/2022, 3:08 PMCan you explain what is the problem with that? Do you have some budget or resource constraints?
We have devops team to manage and monitor the system and just don't want to pollute the logging system with so many pods spawning to just spawn another pods. Another thing is we have many other kuber batch jobs in the same namespace, and are controlling those jobs separately from Prefect, so we want to minimize the effort of monitor unnessesary jobs
Anna Geller
kubectl get pods -l environment=prefect
Chu Lục Ninh
03/29/2022, 4:45 PMAnna Geller
Chu Lục Ninh
03/30/2022, 2:01 PM