I've been trying to figure out how to run a flow consisting of ~3000 parallel flow runs using ACI as a backend. I tried a basic prefect server in a VM, then switched to the helm chart, then added a postgresql-ha chart. All these set ups work up to ~500 active flow runs and then crash with various kinds of server errors (too many active connections, timeouts, etc.). In my case the server, ACI workers and the k8s cluster have autoscaling enabled and seem to be scaling to 10-20 pods each during runs.
I was wandering if prefect is designed to handle such use cases at all and if anyone has tried doing that with a self-hosted server. I've got no experience in setting up infrastructure like that so any tips, even the obvious ones, would be welcome.