Hey folks I was wondering if anyone had some docs or advice Prefect Community #ask-community

Hey folks, I was wondering if anyone had some docs...

Michael Law

06/10/2021, 8:55 PM

Hey folks, I was wondering if anyone had some docs or advice around the best way setup running agents on an AKS cluster? I am wondering, do I need to package my jobs up to ACR or some other container registers? I’m struggling to identify from the docs how the DockerStorage and the KubernetesRun config is used. I would find a diagram useful here to show the relationship between all of these things. I don’t suppose anyone can ELI5 it for me or had a diagram at hand which details the architecture of it all? Thanks again.

Kevin Kho

06/10/2021, 8:58 PM

He’s offline. But I’m sure @ciaran would be willing to chime in here. He’s done this.

🦜 1

Zanie

06/10/2021, 9:18 PM

@Michael Law •

prefect agent kubernetes install

will generate YAML to install an agent on your cluster • If you want to use

DockerStorage

, you'll need to store your images somewhere. You may want to use ACR. Generally, we'd recommend not using

DockerStorage

because it can be slow to build your flows. It's easier to store the flow on

Github

S3

and configure your execution image using the

KubernetesRun

• If using

DockerStorage

with a

KubernetesRun

, you do not need to set an image on the

KubernetesRun

-- it will be inferred as the image built by

DockerStorage

• If using another type of storage with a

KubernetesRun

, the default image will just be the

prefect

image. You can set your own base image if you'd like. • You do not need to configure a

KubernetesRun

to run your flows on Kubernetes. If you're not doing any customizations, you can leave the run config empty and it will be given a

UniversalRun

run config. This means that a

DockerStorage

flow will just work on your Kubernetes agent in the image it's stored in and a flow with another storage type will just run on the default image as I described above.

Michael Law

06/11/2021, 6:46 AM

Cool thanks for this @Zanie. I have a couple of things to pitch in with here. You mention an ‘execution image’ what do you mean by that? We are using databricks and databricks-connect too, so I would imagine that will need to be added to my container image too? Am I right in thinking that I would need to build a container which contains my python flows and also my databricks-connect config, so when I trigger a flow it will connect to a specific cluster? This is what will be deployed to kubernetes or am I getting the wrong end of the stick?

Michael Law

06/11/2021, 6:47 AM

A flows will be installed on specific containers depending on the cluster I need them to speak too?

ciaran

06/11/2021, 9:28 AM

Hey @Michael Law we just did this on the

pangeo-forge

project

ciaran

06/11/2021, 9:29 AM

https://github.com/pangeo-forge/pangeo-forge-azure-bakery We're using Terraform to spin it all up and then

kubectl

to apply the Prefect config

Michael Law

06/11/2021, 9:30 AM

Awesome man

Michael Law

06/11/2021, 9:30 AM

Yeah we are doing the same with Terraform and AKS / Databricks etc

Michael Law

06/11/2021, 9:30 AM

Just setting up the prefect has me scratching my head on the best practice

ciaran

06/11/2021, 9:30 AM

You might find the

prefect_agent_conf.yaml

useful, I found that the prefect command didn't quite create everything so with some help from @Tyler Wanner we managed to get it working

ciaran

06/11/2021, 9:31 AM

I don't claim we've done it via best practices, I'll put that out there 😅

Michael Law

06/11/2021, 9:32 AM

Haha

ciaran

06/11/2021, 9:34 AM

(We're not using DockerStorage)

Michael Law

06/11/2021, 9:40 AM

Cool, so if i understood correctly on Kubernetes I thought these ran as jobs? It seems from your config that isnt the case, is that correct?

ciaran

06/11/2021, 9:46 AM

What do you mean by

these

Michael Law

06/11/2021, 9:46 AM

So when we configured a flow, I though the flow would run as a job

Michael Law

06/11/2021, 9:47 AM

I could easily have misunderstood the docs

ciaran

06/11/2021, 9:47 AM

Yep it still does

Michael Law

06/11/2021, 9:47 AM

Are you packaging all your flows together and pushing them to a container via that yaml

Michael Law

06/11/2021, 9:48 AM

and your Dockerfile

ciaran

06/11/2021, 9:48 AM

No, that Yaml is spinning up the Prefect Agent on AKS. Then we're registering flows via the Prefect CLI (Storing them on Azure Blob Storage)

ciaran

06/11/2021, 9:48 AM

We're not running Prefect Server

Michael Law

06/11/2021, 9:51 AM

Right think this is where I am getting lost

Michael Law

06/11/2021, 9:52 AM

So you place your python and flows on Blob Storage?

Michael Law

06/11/2021, 9:52 AM

How and where are you registering your prefect flows?

ciaran

06/11/2021, 9:52 AM

Yep! If you'd like, I can run you through this on a call? I'll have to get our Azure deployment up but I'm happy to run your through it

Michael Law

06/11/2021, 9:53 AM

That would be unreal man

Michael Law

06/11/2021, 9:53 AM

Up for it any time

ciaran

06/11/2021, 9:54 AM

Cool, I'll get it spun up, make sure I can demo it live without breaking something, then I'll sling a meeting link over!

Michael Law

06/11/2021, 9:54 AM

#hero

Michael Law

06/11/2021, 10:02 AM

I'm nipping out for 10 minutes dude, brb in case you bang a link out and I dont get back right away 👍

ciaran

06/11/2021, 10:13 AM

No worries! let me know when you're back and I'll ping the link

Michael Law

06/11/2021, 10:38 AM

Thanks man, thats me back

ciaran

06/11/2021, 10:38 AM

https://meet.google.com/wot-paso-kxy 🦜

Michael Law

06/11/2021, 4:43 PM

Michael Law

06/11/2021, 4:46 PM

Hey @ciaran & @Kevin Kho, this is a stab at what I think a pipeline could look like here for packaging up a container of our dependencies to ACR, then registering our flows with Prefect. Subsequently we could then run the agent based on labels to push jobs from prefect cloud and based on the label send to the related databricks cluster via databricks-connect. Be interested to hear your thoughts on whether you think this is mental or not?

✅ 1

Michael Law

06/11/2021, 4:48 PM

Thanks again for your time earlier @ciaran was super helpful

Zanie

06/11/2021, 5:06 PM

Sweet!

Kevin Kho

06/11/2021, 6:19 PM

I’m not quite sure how the label connects to the databricks cluster. Is it that you have databricks-connect set-up on the agent and it’s configured to a cluster? This means when you run Spark code on an agent, it hits the appropriate Databricks cluster? If that was your thought, I think this looks good! That’s clever. How did you draw this diagram? It looks nice!

Michael Law

06/11/2021, 6:25 PM

That’s exactly it Kevin, the agent jobs are configured with databricks connect and depending on labels will connect to different sized clusters

Michael Law

06/11/2021, 6:26 PM

I drew it using the draw.io (diagrams.net) vscode extension. It’s awesome.

👍 1

ciaran

06/11/2021, 7:08 PM

Glad you got what you were aiming for @Michael Law! The diagram looks great too!

Open in Slack

Previous Next