Chu Lục Ninh03/29/2022, 1:09 PM
. But everytime I run the flow using
, the agent always spin up new job, which in my case is useless and waste of resource. Since the flow is mainly about spin up new k8s job, I want
run it directly. Please advise me the way to do that.
is really intended to spin up a new pod. If you have jobs in other languages, you can try calling them directly inside your Flow with the
so you can call them through the command line. I think running them all on the agent is less efficient right? Because that means your agent pod needs all of the resources to run the batch jobs. If you leave the agent lightweight, then you can just create a pod to run the Flow. And then instead of
shell = ShellTask() with Flow(...) as flow: shell(...)
, just use the
to invoke those all those programs inside the container.
Chu Lục Ninh03/29/2022, 1:53 PM
doesn't help much. 100% tasks in my flow is
, that's why I believe the agent will have no workload at all, and spin up kuber job to just run that kind of flow is wasting resources
, are you referring to
? Could you tell me why the
Chu Lục Ninh03/29/2022, 1:56 PM
flow = Flow("test", tasks=[RunKubernetesJob(), RunKubernetesJob(), RunKubernetesJob()], run_config=KubernetesRun())
Chu Lục Ninh03/29/2022, 2:04 PM
Chu Lục Ninh03/29/2022, 2:15 PM
. This method decides how the compute infrastructure for the flow run should be deployed. Then, when it comes to where and how your task runs get executed, this is what executor decides. If you use the default
, then all your task runs are running within the same execution environment as the flow run, here the flow run pod deployed as a Kubernetes job. If you would use e.g. a
, then your task runs would be shipped to Dask workers for execution. When running dask on Kubernetes, you could e.g. use
class to spin up a Dask cluster on Kubernetes. In your use case, each of your flow runs gets deployed as Kubernetes job and since you designed your tasks to run as separate Kubernetes jobs via a Kubernetes task, each of your task also gets deployed into a separate pod. But there is no way around still having a flow run pod - this is an entirely separate concern as what your tasks are doing. Your task could run on Databricks if you wish, or could execute some in-warehouse SQL transformation, but Prefect flow run process needs its own process (a subprocess, a Docker container, a Kubernetes job, an ECS task)
Chu Lục Ninh03/29/2022, 2:50 PM
to execute flow in sub-process like LocalAgent?
so it run flow directly in subprocess instead of spawn new pod for the flow
I just want to customizeIf that’s the case, you should useso it run flow directly in subprocess instead of spawn new pod for the flow
creates a subprocess and runs some custom Linux shell command within that subprocess.
Chu Lục Ninh03/29/2022, 3:00 PM
Chu Lục Ninh03/29/2022, 3:08 PM
Can you explain what is the problem with that? Do you have some budget or resource constraints?
We have devops team to manage and monitor the system and just don't want to pollute the logging system with so many pods spawning to just spawn another pods. Another thing is we have many other kuber batch jobs in the same namespace, and are controlling those jobs separately from Prefect, so we want to minimize the effort of monitor unnessesary jobs
Chu Lục Ninh03/29/2022, 4:45 PM
Chu Lục Ninh03/30/2022, 2:01 PM