More of a technical question for prefect developer...
# ask-community
d
More of a technical question for prefect developers: OSS How is the connection setup managed between the worker to server to database? Does each the server open a new connection to the database for each of the worker OR does the server use existing connection pool to the database? to simplify or rephrase, what are the database impacts on the database if i have 2k workers connected to a server backed by a single database?
j
If you're talking about Prefect Workers, they don't connect directly to the database. They connect through HTTP to the API server. It is not a single long lived persisted connection. The Worker is using an http connection pool though. On the API server side there is a database connection pool. So every time a request comes in the server grabs a db connection and uses it for the duration of that request Does that answer your question?
d
How does the polling for tasks on worker side work? Is it cached on the server ?
By task i mean flow run
j
The server pulls a batch of flow runs that have hit their scheduled time, it's not something that is cached as you'd expect the information to be more or less different every time
d
Does the number of workers connected to a server increase the load on the database ?
Assumming the worker isnt writing logs OR creating new tasks
j
Generally speaking yes? although it's not doing something that should be particularly stressful for your db more workers -> more polling api requests -> more api requests on the server -> more queries on your db
d
So in some sense If i have 1k++ kubernetes pods each executing 1 flow run. The UI might get slower as the database has too many queries being run?
j
Just to make sure I'm on the same page, are you talking about having 1k+ workers that each deploy 1 flow run in a k8s job?
d
we have 3 agents - each spawns a pod for a task run
task run may be 8-30 ish
flow run*
but 1k is just for an exaggerated view
j
ah got it. Was going to say Workers are capable of managing/deploying multiple flow runs, and you'd have a long way to go before maybe needing 1000 of them 😅
d
The issue I have now is UI is very slow- on page changes might take upto 10 seconds for elements to start populating
‼️ 1
j
to your question: I would not expect the extra load from a lot of workers to impact your db, the queries it makes are intended to be fast and small. but ultimately that can be up to the size/configuration of your db
do you know the source of the slowness?
d
Db mem and cpu usage havnt reached 50%
That is what I am trying to find out
j
If your database seems to be operating okay and query time seem to be fast, next link in the chain would be the api server. Does it have enough resources? Is it overwhelmed by requests? etc.
d
We are getting these issues. Our server API is secured by Azure AD over Azure App service. I am unsure if this is coming over from azure or prefect.
Network tab on the browser
Copy code
{
  "exception_message": "Invalid request received.",
  "exception_detail": [
    {
      "loc": [
        "path",
        "id"
      ],
      "msg": "value is not a valid uuid",
      "type": "type_error.uuid"
    }
  ],
  "request_body": null
}
Looks like an error from prefect
Also these bunch of request seems like they could have been parallized but arent
M loading flow run page btw, in those examples
j
do you have access to the server logs? 503s are probably (but not 100%) requests that timed out server side trying to read from the db
d
Yes we I do. we have 3 servers connected to the same database deployed with helm so we have HA. Finding logs between the 3 server might take some time. Is there any filter I can check?
helm with repica set of 3 😄
Seems like the easiest fix would be to increase the DB connection Timeout setting?
j
I would take a look at these things: • are queries executing quickly? ◦ increase the DB timeout setting ◦ if not your database might not be sized/have it's settings tuned correctly ◦ your database tables might be getting too big (OSS doesn't have an automatic retention feature) • if the above is true, is the api server overwhelmed? is it getting more requests than it can handle? ◦ you can look to scale your api server out more ◦ you can try vertically scaling your severs cpu/mem ◦ you can try hosting a dedicated server just for the UI
d
• are queries executing quickly? ◦ increase the DB timeout setting ▪︎ will try this ◦ if not your database might not be sized/have it's settings tuned correctly ▪︎ Azure monitoring dashboard so far shows no stress on DB. I am guessing we are running out of postgres worker nodes. will check connections as well. ◦ your database tables might be getting too big (OSS doesn't have an automatic retention feature) ▪︎ We have flows in place that deletes old data from the database table directly. Logs , cache expiration and other stuff that prefect OSS doesnt automatically does. Azure hosted postgres automatically vacccums and analayzes tables based on table stat. I dont think there are issues here. • if the above is true, is the api server overwhelmed? is it getting more requests than it can handle? ◦ you can look to scale your api server out more ▪︎ We use azure web apps. Pretty sure the setup is good enough. Nothing flagged by the monitors so far. ◦ you can try vertically scaling your severs cpu/mem ▪︎ web app hosted on very large vm should be fine. ◦ you can try hosting a dedicated server just for the UI ▪︎ It is dedicated. I will start with DB and let you know if i find issues. Will get back to u in some weeks time 😄 thanks @Jake Kaplan
j
np, sounds good let me know!