https://prefect.io logo
a

Andrew Dowrick

07/31/2023, 5:13 AM
Hi all - Does anyone have any insight into this kind of error? I'm coming up trumps.
Copy code
prefect[2218]: sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: <https://sqlalche.me/e/20/3o7r>)
t

Tim Galvin

07/31/2023, 8:25 AM
I have been seeing this on my own host prefect server running alongside a postgres database in a container. In short, I believe it is a case that postgres is not keeping up with the incoming stream of transactions to process. In my case, I am hitting the logging API hard with dozens of proccesses attempting to record thousands of lines of logs at once. The postgres server is running on a very slim VM with a pretty slow NFS mount disk. I don't know postgres well enough to be sure, but I gather as it is creating checkpoints to the slow disk something something something prefect transactions time out. I was able to solve this, at least temporarily, but setting postfgres to use the in-memory /dev/shm temporary file system. The moment I did that I stopped getting those errors. Of course, since this is an im-memory file system the moment I restart the VM I am going to lose my database.
a

Andrew Dowrick

07/31/2023, 8:26 AM
I was hoping the sqlalchemy pool could be adjusted, however i cant seem to work out how i would do that. assuming that would be a source code change
I'll try your suggestion, we're running 1000+ tasks in parallel so im not sure if it'll be enough
t

Tim Galvin

07/31/2023, 8:52 AM
I have had success editting the sqlalchemely code and bumping up those defaults directly. I believe I greped the source code to find where they lived, and just increased them. It helped in my use case, but long term I don't think it was really all that sustainable
a

Andrew Dowrick

07/31/2023, 10:04 PM
Any chance you could point me in the right direction of where you made that edit? I'm a sysadmin, not a developer so this is all foreign to me
t

Tim Galvin

08/01/2023, 1:03 AM
I cant sorry, it was a while ago in a environment I have long since blown away
m

Matt Klein

08/01/2023, 6:58 PM
@Andrew Dowrick @Tim Galvin I’ve just opened this PR which I believe addresses this issue. Would love to see this merged into the
main
branch and released. Please upvote this if you agree so that hopefully it’ll get attention.
a

Andrew Dowrick

08/01/2023, 11:12 PM
Looks good, I did this change manually, locally, and it fixed the issue. I will upvote now