Jean Du
09/23/2025, 9:38 AMsqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: <https://sqlalche.me/e/20/3o7r>)
even though I have set server.database.sqlalchemy.pool_size = 50
and server.database.sqlalchemy.max_overflow = 100
Marvin
09/23/2025, 9:38 AMMarvin
09/23/2025, 9:42 AMexport PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=50
export PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=100
export PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=30
export PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=3600
prefect server start
- Or set them in your active profile and restart:
prefect config set server.database.sqlalchemy.pool_size=50 server.database.sqlalchemy.max_overflow=100
prefect config set server.database.sqlalchemy.pool_timeout=30 server.database.sqlalchemy.pool_recycle=3600
prefect config validate
prefect config view
- Docker/Compose or Kubernetes:
- These must be set as environment variables on the API and any Prefect server service containers that make DB connections. Updating your local profile won’t affect containers. After updating, restart the containers/pods.
2) Verify the settings inside the running Server process
- If you can exec into the server container (or host process environment), run:
python -c "from prefect.settings import get_current_settings; s=get_current_settings(); print(s.server.database.sqlalchemy.pool_size, s.server.database.sqlalchemy.max_overflow, s.server.database.sqlalchemy.pool_timeout, s.server.database.sqlalchemy.pool_recycle)"
You should see 50 100 30 3600 (or whatever you’ve set). If you still see 5/10/30, the values aren’t being applied to the server process.
3) Restart required
- Any change to environment or profile settings requires a restart of the server processes to take effect.
4) Consider pool_timeout too
- If you see bursts of connections, you might also need to increase PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT
beyond 30 seconds.
If you can share how you’re starting Prefect Server (host vs Docker/Compose vs Helm/K8s) and exactly where you set server.database.sqlalchemy.pool_size
and max_overflow
, I can give step-by-step instructions tailored to your setup.Jean Du
09/23/2025, 9:44 AMJean Du
09/23/2025, 9:45 AMMarvin
09/23/2025, 9:45 AMMarvin
09/23/2025, 9:46 AMfrom sqlalchemy import create_engine
engine = create_engine(
DATABASE_URL,
pool_size=50,
max_overflow=100,
pool_timeout=30,
pool_recycle=3600,
pool_pre_ping=True,
)
- Async engine
from sqlalchemy.ext.asyncio import create_async_engine
engine = create_async_engine(
ASYNC_DATABASE_URL,
pool_size=50,
max_overflow=100,
pool_timeout=30,
pool_recycle=3600,
pool_pre_ping=True,
)
2) If you are running Prefect Server in Docker/Compose/Kubernetes
- Ensure the env vars are set on every Prefect server process/container that touches the DB (API and any service/migration containers), not just on your shell or local profile. After updating, restart pods/containers.
3) Double-check which process is throwing the error
- Look at the log source (container name, process name). If it’s your flow container logs, update your flow’s engine as above. If it’s the server API container logs, ensure that container got:
- PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=50
- PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=100
- PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=30
- PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=3600
References
- Prefect 3 settings reference: Settings reference
- SQLAlchemy QueuePool error background: QueuePool timeout
If you can share:
- Where you see the error (server API logs vs. your flow logs)
- How you run Prefect Server (host vs Docker/Compose vs K8s)
- Whether your flows use SQLAlchemy to a separate database
…I can give you concrete, step-by-step changes for your setup.Jean Du
09/23/2025, 9:50 AMMarvin
09/23/2025, 9:50 AMMarvin
09/23/2025, 9:52 AMpython -c "from prefect.settings import get_current_settings as g; s=g(); print('pool', s.server.database.sqlalchemy.pool_size, s.server.database.sqlalchemy.max_overflow, s.server.database.sqlalchemy.pool_timeout, s.server.database.sqlalchemy.pool_recycle); print('url', s.server.database.connection_url)"
You should see the same 50/100/30/3600 and your server DB URL. If you see 5/10/30 or a different DB URL there, that’s the culprit.
Common causes and fixes:
- Multi-container setups (Docker/Compose/K8s): the env vars must be present on every Prefect server container that touches the DB (API and any services/migration containers). Local profiles do not affect containers. Restart those pods/containers after updating env.
- System service/process managers: ensure the service environment includes the vars and restart all Prefect server processes.
- Multiple server processes: if you run API and services separately, verify both picked up the settings.
References
- Settings reference: Prefect 3 settings
- Server CLI: Server CLI guide
- SQLAlchemy QueuePool: Background on the error
If you share:
- The exact log source (which container/process)
- How you run Server (host vs Docker/Compose vs K8s)
- The value of server.database.connection_url
in that process
…I’ll map out the exact env/config changes to apply.Jean Du
09/23/2025, 10:08 AMprefect server start
in the background within a container, and I am getting the errors in the .err
logs from that processMarvin
09/23/2025, 10:08 AMMarvin
09/23/2025, 10:10 AMprefect config set
after the process starts won’t affect the existing background server. It may also be using a different profile than your shell.
Fix steps to ensure the background server process picks up the settings
1) Stop the currently running server in the container
prefect server stop || true
# If needed as a fallback:
pkill -f "prefect server" || true
2) Ensure the env vars are present in the container at process start
- For Docker Compose:
services:
prefect:
image: prefecthq/prefect:3-latest
environment:
- PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=50
- PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=100
- PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=30
- PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=3600
command: >
bash -lc "prefect server start"
- For docker run:
docker run -d --name prefect \
-e PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=50 \
-e PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=100 \
-e PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=30 \
-e PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=3600 \
prefecthq/prefect:3-latest bash -lc "prefect server start"
- If you’re using a Dockerfile entrypoint, export env first and then start the server in the same process:
ENV PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=50 \
PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=100 \
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=30 \
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=3600
CMD ["bash", "-lc", "prefect server start"]
3) Restart the container so the server starts with those env vars
4) Verify the background process has them
- Check the background process environment directly:
PID=$(pgrep -f "prefect server")
tr '\0' '\n' < /proc/$PID/environ | grep PREFECT_SERVER_DATABASE_SQLALCHEMY
You should see your values.
- Or fresh-start and confirm the effective settings in the container before starting:
prefect config view | grep -E "PREFECT_SERVER_DATABASE_SQLALCHEMY"
5) Clear/rotate the old .err log before testing again
Old errors in an appended log can be misleading:
:> path/to/your/server.err
Why your flow’s sqlite doesn’t matter here
- Your flow using sqlite3 is separate. Prefect Server uses SQLAlchemy for its own DB (commonly Postgres). The “QueuePool limit of size 5 overflow 10” is from a Server process hitting its DB with default pool settings, not from your flow.
References
- Prefect 3 settings reference: Settings reference
- Server CLI: Server CLI guide
- SQLAlchemy error background: QueuePool timeout
If this still persists after a clean restart with env vars set in the container spec, please share:
- Your container run/compose snippet (env + command)
- The DB URL used by the server (prefect config view | grep PREFECT_SERVER_DATABASE_CONNECTION_URL
)
- The first 30 lines of a fresh .err
after restart
I’ll help pinpoint the last missing piece.Jean Du
09/23/2025, 10:12 AMpyproject.toml
file, not interactivelyMarvin
09/23/2025, 10:12 AMMarvin
09/23/2025, 10:13 AMPREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=50
PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=100
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=30
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=3600
Then restart the container so the background server starts with these.
Option B: bake the settings into a Prefect profile in the container
- During image build or entrypoint, set the profile and use it before starting the server:
prefect profile create server || true
prefect profile use server
prefect config set \
server.database.sqlalchemy.pool_size=50 \
server.database.sqlalchemy.max_overflow=100 \
server.database.sqlalchemy.pool_timeout=30 \
server.database.sqlalchemy.pool_recycle=3600
prefect config validate
prefect server start
Option C: ensure the server starts in the directory that contains pyproject.toml
- If your pyproject lives at /app/pyproject.toml, start the server with that as the cwd:
bash -lc "cd /app && prefect server start"
Make sure your pyproject has the correct structure. Either dotted keys under [tool.prefect]…
[tool.prefect]
server.database.sqlalchemy.pool_size = 50
server.database.sqlalchemy.max_overflow = 100
server.database.sqlalchemy.pool_timeout = 30
server.database.sqlalchemy.pool_recycle = 3600
…or a nested table:
[tool.prefect.server.database.sqlalchemy]
pool_size = 50
max_overflow = 100
pool_timeout = 30
pool_recycle = 3600
Verify the background process is picking them up
- Check the server process’s cwd to confirm whether pyproject is visible:
PID=$(pgrep -f "prefect server")
ls -l /proc/$PID/cwd
- Check effective settings from that environment:
python -c "from prefect.settings import get_current_settings as g; s=g(); print(s.server.database.sqlalchemy.pool_size, s.server.database.sqlalchemy.max_overflow, s.server.database.sqlalchemy.pool_timeout, s.server.database.sqlalchemy.pool_recycle)"
- If you still see 5/10/30, the process is not reading your pyproject; use Option A or B.
References
- Settings reference: Prefect 3 settings
- Server CLI: Server CLI guide
- SQLAlchemy QueuePool: Background on the error
If you can share where pyproject.toml sits in your container and how you start the server (Dockerfile/CMD/entrypoint), I’ll suggest the exact change to ensure the background server sees it.Jean Du
09/23/2025, 10:22 AMapi.url
setting from the same pyproject.toml
file was applied without issues?Marvin
09/23/2025, 10:22 AMMarvin
09/23/2025, 10:23 AMapi.url
is a client setting and you’re likely seeing it applied by the CLI/flows (in the shell/CWD where your pyproject lives), while the SQLAlchemy pool settings only matter for the Prefect Server process — which is running in the background and may not be reading that same pyproject.
Key points:
- api.url
controls where the Prefect client (CLI, flows) talks to. It’s not used by the server itself. So when you run commands like prefect config view
or your flows, they read your pyproject.toml in that shell’s current working directory and you see api.url
“working”.
- The server’s DB pool settings (server.database.sqlalchemy.*
) are only consumed by the Prefect Server process. When you start the server in the background inside a container, that process may have a different working directory or start before your pyproject existed. In that case, it won’t see your pyproject.toml and will fall back to defaults (pool_size=5, max_overflow=10).
How to reconcile:
- Best in containers: set environment variables on the container so CWD doesn’t matter:
- PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=50
- PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=100
- PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=30
- PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=3600
Then restart the container.
- Or, put the settings in a Prefect profile (profiles.toml under PREFECT_HOME) that the server process always loads, regardless of CWD.
- Or, ensure the server background process starts in the directory that actually contains your pyproject.toml (e.g., cd /app && prefect server start
).
If you want to prove this is the cause, compare settings from the server process’s CWD:
- Find the server process CWD:
PID=$(pgrep -f "prefect server")
readlink -f /proc/$PID/cwd
- In that directory, run:
python -c "from prefect.settings import get_current_settings as g; s=g(); print(s.server.database.sqlalchemy.pool_size, s.server.database.sqlalchemy.max_overflow, s.server.database.sqlalchemy.pool_timeout, s.server.database.sqlalchemy.pool_recycle)"
If you don’t see 50/100/30/3600 there, the background process isn’t reading your pyproject.
Refs:
- Prefect 3 settings reference
- Server CLI guide
- SQLAlchemy QueuePool timeout background
If you share your Dockerfile/compose snippet (env + workdir + command), I can suggest the minimal change to make the server pick up these settings reliably.Jean Du
09/23/2025, 10:44 AMMarvin
09/23/2025, 10:44 AMMarvin
09/23/2025, 10:52 AMprefect server services ls
# If not running, start them:
prefect server services start
If you start with prefect server start
, do not pass --no-services
.
Common reasons your automation didn’t cancel the run
- Automation scope/match didn’t include your run (e.g., filters on name/tags/workspace). Re-open the automation in the UI and confirm its filters match the flow.
- Triggers/Actions services disabled or not running (see above).
- No events/heartbeats for the run so “zombie” detection doesn’t kick in. Check the flow run’s details for “last heartbeat” to confirm it stopped.
- Using an older container/image where services default to off. Ensure you’re on a recent 3.x image and that env vars enabling services are set as needed.
Why “Cancelling” can hang forever
- The server sets state=“Cancelling”, then asks the underlying infrastructure to stop the process. If the process is already dead or the worker/infrastructure is gone/offline, no final acknowledgement comes back and the run can sit in “Cancelling”.
- The CancellationCleanup service finalizes these runs after a timeout. If it’s disabled, they’ll never be finalized. Verify it’s enabled in your server and that the service loop is running.
Quick checks and actions
1) Verify services in your container
prefect server services ls
You should see ProactiveTriggers, ReactiveTriggers, Actions, EventPersister, and CancellationCleanup enabled and running. If not, enable/start them and restart the server container.
2) Confirm the server process is actually running services
- If you start the API in the background, make sure it was not started with --no-services
.
- If you have multiple server processes/containers, all must have matching configuration.
3) Inspect the stuck run
prefect flow-run inspect <FLOW_RUN_ID>
- Confirm state history, last heartbeat, any infrastructure PID/identifier.
- If it’s truly orphaned, try:
prefect flow-run cancel <FLOW_RUN_ID>
Then give CancellationCleanup a couple of minutes to finalize. If cleanup is disabled, enable it and restart the server services.
4) As a last resort for cleanup
- If the process is gone and you need to unclog the UI, you can delete the run:
prefect flow-run delete <FLOW_RUN_ID>
If you can share:
- How you start the server in the container (command/entrypoint; are you using --no-services
?)
- The output of prefect server services ls
inside that container
- One example prefect flow-run inspect <id>
for a stuck run
…I’ll give you precise steps to fix automations and ensure stuck runs are auto-cancelled.
References
- Server CLI and services: Prefect Server CLI guide
- Cancelling workflows: Cancel workflows
- Detecting zombies overview: Detect zombie flows