More of a technical question for prefect developers OSS How Prefect Community #ask-community

More of a technical question for prefect developer...

Deceivious

10/12/2023, 12:55 PM

More of a technical question for prefect developers: OSS How is the connection setup managed between the worker to server to database? Does each the server open a new connection to the database for each of the worker OR does the server use existing connection pool to the database? to simplify or rephrase, what are the database impacts on the database if i have 2k workers connected to a server backed by a single database?

Jake Kaplan

10/12/2023, 2:01 PM

If you're talking about Prefect Workers, they don't connect directly to the database. They connect through HTTP to the API server. It is not a single long lived persisted connection. The Worker is using an http connection pool though. On the API server side there is a database connection pool. So every time a request comes in the server grabs a db connection and uses it for the duration of that request Does that answer your question?

Deceivious

10/12/2023, 2:09 PM

How does the polling for tasks on worker side work? Is it cached on the server ?

Deceivious

10/12/2023, 2:09 PM

By task i mean flow run

Jake Kaplan

10/12/2023, 2:12 PM

The server pulls a batch of flow runs that have hit their scheduled time, it's not something that is cached as you'd expect the information to be more or less different every time

Deceivious

10/12/2023, 3:00 PM

Does the number of workers connected to a server increase the load on the database ?

Deceivious

10/12/2023, 3:00 PM

Assumming the worker isnt writing logs OR creating new tasks

Jake Kaplan

10/12/2023, 3:02 PM

Generally speaking yes? although it's not doing something that should be particularly stressful for your db more workers -> more polling api requests -> more api requests on the server -> more queries on your db

Deceivious

10/12/2023, 3:03 PM

So in some sense If i have 1k++ kubernetes pods each executing 1 flow run. The UI might get slower as the database has too many queries being run?

Jake Kaplan

10/12/2023, 3:06 PM

Just to make sure I'm on the same page, are you talking about having 1k+ workers that each deploy 1 flow run in a k8s job?

Deceivious

10/12/2023, 3:06 PM

we have 3 agents - each spawns a pod for a task run

Deceivious

10/12/2023, 3:07 PM

task run may be 8-30 ish

Deceivious

10/12/2023, 3:07 PM

flow run*

Deceivious

10/12/2023, 3:07 PM

but 1k is just for an exaggerated view

Jake Kaplan

10/12/2023, 3:08 PM

ah got it. Was going to say Workers are capable of managing/deploying multiple flow runs, and you'd have a long way to go before maybe needing 1000 of them 😅

Deceivious

10/12/2023, 3:09 PM

The issue I have now is UI is very slow- on page changes might take upto 10 seconds for elements to start populating

‼️ 1

Jake Kaplan

10/12/2023, 3:10 PM

to your question: I would not expect the extra load from a lot of workers to impact your db, the queries it makes are intended to be fast and small. but ultimately that can be up to the size/configuration of your db

Jake Kaplan

10/12/2023, 3:10 PM

do you know the source of the slowness?

Deceivious

10/12/2023, 3:11 PM

Db mem and cpu usage havnt reached 50%

Deceivious

10/12/2023, 3:11 PM

That is what I am trying to find out

Jake Kaplan

10/12/2023, 3:14 PM

If your database seems to be operating okay and query time seem to be fast, next link in the chain would be the api server. Does it have enough resources? Is it overwhelmed by requests? etc.

Deceivious

10/12/2023, 3:15 PM

Deceivious

10/12/2023, 3:15 PM

We are getting these issues. Our server API is secured by Azure AD over Azure App service. I am unsure if this is coming over from azure or prefect.

Deceivious

10/12/2023, 3:16 PM

Deceivious

10/12/2023, 3:16 PM

Network tab on the browser

Deceivious

10/12/2023, 3:16 PM

Copy code

{
  "exception_message": "Invalid request received.",
  "exception_detail": [
    {
      "loc": [
        "path",
        "id"
      ],
      "msg": "value is not a valid uuid",
      "type": "type_error.uuid"
    }
  ],
  "request_body": null
}

Deceivious

10/12/2023, 3:16 PM

Looks like an error from prefect

Deceivious

10/12/2023, 3:17 PM

Also these bunch of request seems like they could have been parallized but arent

Deceivious

10/12/2023, 3:17 PM

M loading flow run page btw, in those examples

Jake Kaplan

10/12/2023, 3:20 PM

do you have access to the server logs? 503s are probably (but not 100%) requests that timed out server side trying to read from the db

Deceivious

10/12/2023, 3:23 PM

Yes we I do. we have 3 servers connected to the same database deployed with helm so we have HA. Finding logs between the 3 server might take some time. Is there any filter I can check?

Deceivious

10/12/2023, 3:24 PM

helm with repica set of 3 😄

Deceivious

10/12/2023, 3:24 PM

Seems like the easiest fix would be to increase the DB connection Timeout setting?

Jake Kaplan

10/12/2023, 3:25 PM

I would take a look at these things: • are queries executing quickly? ◦ increase the DB timeout setting ◦ if not your database might not be sized/have it's settings tuned correctly ◦ your database tables might be getting too big (OSS doesn't have an automatic retention feature) • if the above is true, is the api server overwhelmed? is it getting more requests than it can handle? ◦ you can look to scale your api server out more ◦ you can try vertically scaling your severs cpu/mem ◦ you can try hosting a dedicated server just for the UI

Deceivious

10/12/2023, 3:31 PM

• are queries executing quickly? ◦ increase the DB timeout setting ▪︎ will try this ◦ if not your database might not be sized/have it's settings tuned correctly ▪︎ Azure monitoring dashboard so far shows no stress on DB. I am guessing we are running out of postgres worker nodes. will check connections as well. ◦ your database tables might be getting too big (OSS doesn't have an automatic retention feature) ▪︎ We have flows in place that deletes old data from the database table directly. Logs , cache expiration and other stuff that prefect OSS doesnt automatically does. Azure hosted postgres automatically vacccums and analayzes tables based on table stat. I dont think there are issues here. • if the above is true, is the api server overwhelmed? is it getting more requests than it can handle? ◦ you can look to scale your api server out more ▪︎ We use azure web apps. Pretty sure the setup is good enough. Nothing flagged by the monitors so far. ◦ you can try vertically scaling your severs cpu/mem ▪︎ web app hosted on very large vm should be fine. ◦ you can try hosting a dedicated server just for the UI ▪︎ It is dedicated. I will start with DB and let you know if i find issues. Will get back to u in some weeks time 😄 thanks @Jake Kaplan

Jake Kaplan

10/12/2023, 3:57 PM

np, sounds good let me know!

5 Views

Open in Slack

Previous Next