We have batch jobs written in other languages and packed int Prefect Community #prefect-server

We have batch jobs written in other languages and ...

Chu Lục Ninh

03/29/2022, 1:09 PM

We have batch jobs written in other languages and packed into container and write our task mainly using

RunKubernetesJob

. But everytime I run the flow using

KunernetesAgent

, the agent always spin up new job, which in my case is useless and waste of resource. Since the flow is mainly about spin up new k8s job, I want

KubernetesAgent

run it directly. Please advise me the way to do that.

Kevin Kho

03/29/2022, 1:50 PM

The agents never run things directly except for Local Agent. You could try using a Local Agent inside a Kubernetes pod and letting it run inside but I don’t think that set-up is right.

RunNamespacedJob

is really intended to spin up a new pod. If you have jobs in other languages, you can try calling them directly inside your Flow with the

ShellTask

Copy code

shell = ShellTask()
with Flow(...) as flow:
    shell(...)

so you can call them through the command line. I think running them all on the agent is less efficient right? Because that means your agent pod needs all of the resources to run the batch jobs. If you leave the agent lightweight, then you can just create a pod to run the Flow. And then instead of

RunNamespacedJob

, just use the

ShellTask

to invoke those all those programs inside the container.

Chu Lục Ninh

03/29/2022, 1:53 PM

ShellTask

doesn't help much. 100% tasks in my flow is

RunKubernetesJob

, that's why I believe the agent will have no workload at all, and spin up kuber job to just run that kind of flow is wasting resources

Kevin Kho

03/29/2022, 1:55 PM

When you say

RunKubernetesJob

, are you referring to

KubernetesRun

RunNamespacedJob

? Could you tell me why the

ShellTask

doesnt help?

Chu Lục Ninh

03/29/2022, 1:56 PM

My code is as follow:

Copy code

flow = Flow("test", tasks=[RunKubernetesJob(), RunKubernetesJob(), RunKubernetesJob()], run_config=KubernetesRun())

Chu Lục Ninh

03/29/2022, 1:57 PM

As you can see, my flow is all about spin up another kuber job, there is no computation there. I cannot pack every batch job in a single container because it is inefficient to do that and may cause conflicts

Kevin Kho

03/29/2022, 1:59 PM

Ah ok I think understand what you are saying. You are saying that each job needs a job, but why even have the Flow pod if the Agent can just kick off jobs directly? It’s just the Flow pod that you are saying is not efficient right?

Chu Lục Ninh

03/29/2022, 2:04 PM

yup, absolutely

Kevin Kho

03/29/2022, 2:15 PM

I understand what you are saying, but the Agent and Flow just have different concerns. Agent is programmed to kick off Flow Runs while the Flow is made to submit tasks. So in order for the agent to kick off these processes, you them to be Flows with one task (maybe ShellTask) to start off the job

🙌 1

Chu Lục Ninh

03/29/2022, 2:15 PM

I believe I can based on LocalAgent and KubernetesAgent to create a new Agent class that do the job like I said, can @Anna Geller advise me more about this?

Anna Geller

03/29/2022, 2:44 PM

@Chu Lục Ninh Kevin provided an excellent explanation but I can try to clarify more. Prefect has this separation of concerns that each agent has a method called

deploy_flow

. This method decides how the compute infrastructure for the flow run should be deployed. Then, when it comes to where and how your task runs get executed, this is what executor decides. If you use the default

LocalExecutor

, then all your task runs are running within the same execution environment as the flow run, here the flow run pod deployed as a Kubernetes job. If you would use e.g. a

DaskExecutor

, then your task runs would be shipped to Dask workers for execution. When running dask on Kubernetes, you could e.g. use

KubeCluster

class to spin up a Dask cluster on Kubernetes. In your use case, each of your flow runs gets deployed as Kubernetes job and since you designed your tasks to run as separate Kubernetes jobs via a Kubernetes task, each of your task also gets deployed into a separate pod. But there is no way around still having a flow run pod - this is an entirely separate concern as what your tasks are doing. Your task could run on Databricks if you wish, or could execute some in-warehouse SQL transformation, but Prefect flow run process needs its own process (a subprocess, a Docker container, a Kubernetes job, an ECS task)

🙌 1

Chu Lục Ninh

03/29/2022, 2:50 PM

I got it. Should I try to be creative and create new Agent that can talk directly to Kuber like

KubernetesAgent

but can

popen

to execute flow in sub-process like LocalAgent?

Chu Lục Ninh

03/29/2022, 2:54 PM

And I don't mean to remove the flow, I need that flow to orchestrate tasks, since my tasks still depend on each other. I just want to customize

KubernetesAgent

so it run flow directly in subprocess instead of spawn new pod for the flow

Anna Geller

03/29/2022, 2:57 PM

well, if you do that, you are kind of creating your own Prefect right? 🙂 this is kind of negative engineering we try to eliminate.

Anna Geller

03/29/2022, 2:59 PM

I just want to customize
KubernetesAgent
so it run flow directly in subprocess instead of spawn new pod for the flow

If that’s the case, you should use

ShellTask

rather than

RunKubernetesJob

ShellTask

creates a subprocess and runs some custom Linux shell command within that subprocess.

Chu Lục Ninh

03/29/2022, 3:00 PM

not really like that, I just think I can customize a little bit to support my new use case. And I think that doesn't break Prefect model at all, since we can have many types of

Agent

right?

Anna Geller

03/29/2022, 3:05 PM

We can, but not sure what you will accomplish this way. Let’s take a step back. I know you somehow don’t like the fact that flow run needs its own pod. Can you explain what is the problem with that? Do you have some budget or resource constraints? the flow run itself shouldn’t consume that much resources and your solution with a separate Kubernetes job per task to manage custom non-Python dependencies sounds like the right approach. Not sure what’s not working as you would want it to. So far it looks like you implemented it the right way, I would do the same in such use case where each task requires different libraries/dependencies

Anna Geller

03/29/2022, 3:05 PM

the only alternative would be Docker agent instead of Kubernetes agent and doing it this way: https://discourse.prefect.io/t/can-prefect-run-each-task-in-a-different-docker-container/434

✅ 1

Chu Lục Ninh

03/29/2022, 3:08 PM

thanks, I will look into that

Chu Lục Ninh

03/29/2022, 3:16 PM

Can you explain what is the problem with that? Do you have some budget or resource constraints?

We have devops team to manage and monitor the system and just don't want to pollute the logging system with so many pods spawning to just spawn another pods. Another thing is we have many other kuber batch jobs in the same namespace, and are controlling those jobs separately from Prefect, so we want to minimize the effort of monitor unnessesary jobs

Anna Geller

03/29/2022, 4:18 PM

hmm maybe label selector is what can help you organize those jobs?

Copy code

kubectl get pods -l environment=prefect

Chu Lục Ninh

03/29/2022, 4:45 PM

sure, but that only helps a little since some prefect jobs have to run in specific namespace, and I have to deal with kubernetes team who manage and monitor my jobs with other teams' non-prefect jobs too

Anna Geller

03/29/2022, 5:54 PM

not sure I understand, you can use both namespaces AND labels, they are not mutually exclusive

Chu Lục Ninh

03/30/2022, 2:01 PM

After discussing today with the team, we decided to accept the fact that Prefect will launch a kuber job to orchestrate other kuber jobs.

👍 1

108 Views

Open in Slack

Previous Next