< Marvin> I keep getting `sqlalchemy exc TimeoutError QueueP Prefect Community #ask-marvin

<@ULVA73B9P> I keep getting `sqlalchemy.exc.Timeou...

Jean Du

09/23/2025, 9:38 AM

@Marvin I keep getting

sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: <https://sqlalche.me/e/20/3o7r>)

even though I have set

server.database.sqlalchemy.pool_size = 50

and

server.database.sqlalchemy.max_overflow = 100

Marvin

09/23/2025, 9:38 AM

✅ thought for 266.4 seconds

Marvin

09/23/2025, 9:42 AM

That error shows SQLAlchemy is still using its defaults (pool_size=5, max_overflow=10, pool_timeout=30s), which means your Prefect Server process isn’t picking up your settings. In Prefect 3.x, the Server DB pool settings are: - server.database.sqlalchemy.pool_size → env: PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE (default 5) - server.database.sqlalchemy.max_overflow → env: PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW (default 10) - server.database.sqlalchemy.pool_timeout → env: PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT (default 30.0) - server.database.sqlalchemy.pool_recycle → env: PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE (default 3600) Docs: - Prefect 3 settings reference - SQLAlchemy background: QueuePool timeout reference A few things to check: 1) How are you running Server and where did you set the values? - prefect server start on a host: - Export env vars before starting and then restart the server:

Copy code

export PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=50
    export PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=100
    export PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=30
    export PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=3600
    prefect server start

- Or set them in your active profile and restart:

Copy code

prefect config set server.database.sqlalchemy.pool_size=50 server.database.sqlalchemy.max_overflow=100
    prefect config set server.database.sqlalchemy.pool_timeout=30 server.database.sqlalchemy.pool_recycle=3600
    prefect config validate
    prefect config view

- Docker/Compose or Kubernetes: - These must be set as environment variables on the API and any Prefect server service containers that make DB connections. Updating your local profile won’t affect containers. After updating, restart the containers/pods. 2) Verify the settings inside the running Server process - If you can exec into the server container (or host process environment), run:

Copy code

python -c "from prefect.settings import get_current_settings; s=get_current_settings(); print(s.server.database.sqlalchemy.pool_size, s.server.database.sqlalchemy.max_overflow, s.server.database.sqlalchemy.pool_timeout, s.server.database.sqlalchemy.pool_recycle)"

You should see 50 100 30 3600 (or whatever you’ve set). If you still see 5/10/30, the values aren’t being applied to the server process. 3) Restart required - Any change to environment or profile settings requires a restart of the server processes to take effect. 4) Consider pool_timeout too - If you see bursts of connections, you might also need to increase

PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT

beyond 30 seconds. If you can share how you’re starting Prefect Server (host vs Docker/Compose vs Helm/K8s) and exactly where you set

server.database.sqlalchemy.pool_size

and

max_overflow

, I can give step-by-step instructions tailored to your setup.

Jean Du

09/23/2025, 9:44 AM

I tried your step 2 and I got 50 100 30.0 3600

Jean Du

09/23/2025, 9:45 AM

@Marvin I tried your step 2 and I got 50 100 30.0 3600

Marvin

09/23/2025, 9:45 AM

✅ thought for 49.8 seconds

Marvin

09/23/2025, 9:46 AM

Great — that confirms the Prefect Server process is seeing your configured values (50, 100, 30.0, 3600). Given that, the “QueuePool limit of size 5 overflow 10” error is almost certainly coming from a different process that is still using SQLAlchemy defaults. Common culprits: - Your flow/application code is creating its own SQLAlchemy engine and hasn’t been configured with larger pool settings. Prefect settings only affect Prefect Server’s DB engine — they do not change your app’s engine. - In containerized setups, another container (e.g., a sidecar, migration/init job, or a second API/service container) isn’t getting the same env vars and is still on defaults. What to do next 1) If the error is from your flow/app’s DB engine, configure pool settings in your code: - Sync engine

Copy code

from sqlalchemy import create_engine

  engine = create_engine(
      DATABASE_URL,
      pool_size=50,
      max_overflow=100,
      pool_timeout=30,
      pool_recycle=3600,
      pool_pre_ping=True,
  )

- Async engine

Copy code

from sqlalchemy.ext.asyncio import create_async_engine

  engine = create_async_engine(
      ASYNC_DATABASE_URL,
      pool_size=50,
      max_overflow=100,
      pool_timeout=30,
      pool_recycle=3600,
      pool_pre_ping=True,
  )

2) If you are running Prefect Server in Docker/Compose/Kubernetes - Ensure the env vars are set on every Prefect server process/container that touches the DB (API and any service/migration containers), not just on your shell or local profile. After updating, restart pods/containers. 3) Double-check which process is throwing the error - Look at the log source (container name, process name). If it’s your flow container logs, update your flow’s engine as above. If it’s the server API container logs, ensure that container got: - PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=50 - PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=100 - PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=30 - PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=3600 References - Prefect 3 settings reference: Settings reference - SQLAlchemy QueuePool error background: QueuePool timeout If you can share: - Where you see the error (server API logs vs. your flow logs) - How you run Prefect Server (host vs Docker/Compose vs K8s) - Whether your flows use SQLAlchemy to a separate database …I can give you concrete, step-by-step changes for your setup.

Jean Du

09/23/2025, 9:50 AM

@Marvin My flow only uses sqlite and that shouldn't require SQLAlchemy engine

Marvin

09/23/2025, 9:50 AM

✅ thought for 157.6 seconds

Marvin

09/23/2025, 9:52 AM

Got it — thanks for checking. That confirms the API process you inspected has the right values. Important clarification: regardless of what your flow uses (sqlite via stdlib), Prefect Server itself uses SQLAlchemy to talk to its own database. The “QueuePool limit of size 5 overflow 10” message indicates some process is still using SQLAlchemy defaults, so it’s almost certainly a different Prefect Server process (or container) that didn’t pick up your settings. Let’s pinpoint which one: - Where do you see the traceback? In: - your flow container logs, or - the Prefect Server logs (API, services, migrations/init), or - another sidecar? - In the exact process/container that emits the error, run:

Copy code

python -c "from prefect.settings import get_current_settings as g; s=g(); print('pool', s.server.database.sqlalchemy.pool_size, s.server.database.sqlalchemy.max_overflow, s.server.database.sqlalchemy.pool_timeout, s.server.database.sqlalchemy.pool_recycle); print('url', s.server.database.connection_url)"

You should see the same 50/100/30/3600 and your server DB URL. If you see 5/10/30 or a different DB URL there, that’s the culprit. Common causes and fixes: - Multi-container setups (Docker/Compose/K8s): the env vars must be present on every Prefect server container that touches the DB (API and any services/migration containers). Local profiles do not affect containers. Restart those pods/containers after updating env. - System service/process managers: ensure the service environment includes the vars and restart all Prefect server processes. - Multiple server processes: if you run API and services separately, verify both picked up the settings. References - Settings reference: Prefect 3 settings - Server CLI: Server CLI guide - SQLAlchemy QueuePool: Background on the error If you share: - The exact log source (which container/process) - How you run Server (host vs Docker/Compose vs K8s) - The value of

server.database.connection_url

in that process …I’ll map out the exact env/config changes to apply.

Jean Du

09/23/2025, 10:08 AM

@Marvin I run

prefect server start

in the background within a container, and I am getting the errors in the

.err

logs from that process

Marvin

09/23/2025, 10:08 AM

✅ thought for 142.2 seconds

Marvin

09/23/2025, 10:10 AM

Thanks — that narrows it down. Since the error is coming from the Prefect Server background process’s .err logs inside your container, that process isn’t seeing your pool settings even though your interactive shell does. What’s likely happening - You set the settings (env/profile) in your interactive shell after the background server was already started, or in a different shell/container layer. The already-running background process won’t inherit those changes. - In containers, relying on

prefect config set

after the process starts won’t affect the existing background server. It may also be using a different profile than your shell. Fix steps to ensure the background server process picks up the settings 1) Stop the currently running server in the container

Copy code

prefect server stop || true
# If needed as a fallback:
pkill -f "prefect server" || true

2) Ensure the env vars are present in the container at process start - For Docker Compose:

Copy code

services:
  prefect:
    image: prefecthq/prefect:3-latest
    environment:
      - PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=50
      - PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=100
      - PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=30
      - PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=3600
    command: >
      bash -lc "prefect server start"

- For docker run:

Copy code

docker run -d --name prefect \
  -e PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=50 \
  -e PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=100 \
  -e PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=30 \
  -e PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=3600 \
  prefecthq/prefect:3-latest bash -lc "prefect server start"

- If you’re using a Dockerfile entrypoint, export env first and then start the server in the same process:

Copy code

ENV PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=50 \
    PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=100 \
    PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=30 \
    PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=3600

CMD ["bash", "-lc", "prefect server start"]

3) Restart the container so the server starts with those env vars 4) Verify the background process has them - Check the background process environment directly:

Copy code

PID=$(pgrep -f "prefect server")
tr '\0' '\n' < /proc/$PID/environ | grep PREFECT_SERVER_DATABASE_SQLALCHEMY

You should see your values. - Or fresh-start and confirm the effective settings in the container before starting:

Copy code

prefect config view | grep -E "PREFECT_SERVER_DATABASE_SQLALCHEMY"

5) Clear/rotate the old .err log before testing again Old errors in an appended log can be misleading:

Copy code

:> path/to/your/server.err

Why your flow’s sqlite doesn’t matter here - Your flow using sqlite3 is separate. Prefect Server uses SQLAlchemy for its own DB (commonly Postgres). The “QueuePool limit of size 5 overflow 10” is from a Server process hitting its DB with default pool settings, not from your flow. References - Prefect 3 settings reference: Settings reference - Server CLI: Server CLI guide - SQLAlchemy error background: QueuePool timeout If this still persists after a clean restart with env vars set in the container spec, please share: - Your container run/compose snippet (env + command) - The DB URL used by the server (

prefect config view | grep PREFECT_SERVER_DATABASE_CONNECTION_URL

) - The first 30 lines of a fresh

.err

after restart I’ll help pinpoint the last missing piece.

Jean Du

09/23/2025, 10:12 AM

@Marvin I set these in a

pyproject.toml

file, not interactively

Marvin

09/23/2025, 10:12 AM

✅ thought for 74.8 seconds

Marvin

09/23/2025, 10:13 AM

Thanks — using pyproject.toml is fine, but there’s a common gotcha in containers: Prefect only reads pyproject.toml from the process’s current working directory at startup. The background server process often runs with a different cwd (e.g., PREFECT_HOME), so it won’t see your pyproject and will fall back to defaults (pool_size=5, max_overflow=10). How to fix Option A (recommended in containers): set env vars on the container so cwd doesn’t matter

Copy code

PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=50
PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=100
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=30
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=3600

Then restart the container so the background server starts with these. Option B: bake the settings into a Prefect profile in the container - During image build or entrypoint, set the profile and use it before starting the server:

Copy code

prefect profile create server || true
prefect profile use server
prefect config set \
  server.database.sqlalchemy.pool_size=50 \
  server.database.sqlalchemy.max_overflow=100 \
  server.database.sqlalchemy.pool_timeout=30 \
  server.database.sqlalchemy.pool_recycle=3600
prefect config validate
prefect server start

Option C: ensure the server starts in the directory that contains pyproject.toml - If your pyproject lives at /app/pyproject.toml, start the server with that as the cwd:

Copy code

bash -lc "cd /app && prefect server start"

Make sure your pyproject has the correct structure. Either dotted keys under [tool.prefect]…

Copy code

[tool.prefect]
server.database.sqlalchemy.pool_size = 50
server.database.sqlalchemy.max_overflow = 100
server.database.sqlalchemy.pool_timeout = 30
server.database.sqlalchemy.pool_recycle = 3600

…or a nested table:

Copy code

[tool.prefect.server.database.sqlalchemy]
pool_size = 50
max_overflow = 100
pool_timeout = 30
pool_recycle = 3600

Verify the background process is picking them up - Check the server process’s cwd to confirm whether pyproject is visible:

Copy code

PID=$(pgrep -f "prefect server")
ls -l /proc/$PID/cwd

- Check effective settings from that environment:

Copy code

python -c "from prefect.settings import get_current_settings as g; s=g(); print(s.server.database.sqlalchemy.pool_size, s.server.database.sqlalchemy.max_overflow, s.server.database.sqlalchemy.pool_timeout, s.server.database.sqlalchemy.pool_recycle)"

- If you still see 5/10/30, the process is not reading your pyproject; use Option A or B. References - Settings reference: Prefect 3 settings - Server CLI: Server CLI guide - SQLAlchemy QueuePool: Background on the error If you can share where pyproject.toml sits in your container and how you start the server (Dockerfile/CMD/entrypoint), I’ll suggest the exact change to ensure the background server sees it.

Jean Du

09/23/2025, 10:22 AM

@Marvin If this is the issue, why is it the case that the

api.url

setting from the same

pyproject.toml

file was applied without issues?

Marvin

09/23/2025, 10:22 AM

✅ thought for 64.1 seconds

Marvin

09/23/2025, 10:23 AM

Great question. The short answer: because

api.url

is a client setting and you’re likely seeing it applied by the CLI/flows (in the shell/CWD where your pyproject lives), while the SQLAlchemy pool settings only matter for the Prefect Server process — which is running in the background and may not be reading that same pyproject. Key points: -

api.url

controls where the Prefect client (CLI, flows) talks to. It’s not used by the server itself. So when you run commands like

prefect config view

or your flows, they read your pyproject.toml in that shell’s current working directory and you see

api.url

“working”. - The server’s DB pool settings (

server.database.sqlalchemy.*

) are only consumed by the Prefect Server process. When you start the server in the background inside a container, that process may have a different working directory or start before your pyproject existed. In that case, it won’t see your pyproject.toml and will fall back to defaults (pool_size=5, max_overflow=10). How to reconcile: - Best in containers: set environment variables on the container so CWD doesn’t matter: -

PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=50

PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=100

PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=30

PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=3600

Then restart the container. - Or, put the settings in a Prefect profile (profiles.toml under PREFECT_HOME) that the server process always loads, regardless of CWD. - Or, ensure the server background process starts in the directory that actually contains your pyproject.toml (e.g.,

cd /app && prefect server start

). If you want to prove this is the cause, compare settings from the server process’s CWD: - Find the server process CWD:

Copy code

PID=$(pgrep -f "prefect server")
  readlink -f /proc/$PID/cwd

- In that directory, run:

Copy code

python -c "from prefect.settings import get_current_settings as g; s=g(); print(s.server.database.sqlalchemy.pool_size, s.server.database.sqlalchemy.max_overflow, s.server.database.sqlalchemy.pool_timeout, s.server.database.sqlalchemy.pool_recycle)"

If you don’t see 50/100/30/3600 there, the background process isn’t reading your pyproject. Refs: - Prefect 3 settings reference - Server CLI guide - SQLAlchemy QueuePool timeout background If you share your Dockerfile/compose snippet (env + workdir + command), I can suggest the minimal change to make the server pick up these settings reliably.

Jean Du

09/23/2025, 10:44 AM

@Marvin Another question: how come the automation I set in the Prefect UI doesn't work? I am getting zombie flows that stay in the "Running" state when they are interrupted and have actually stopped running; I set an automation that any flow staying in the "Running" state for 5 hours should be cancelled, but my flow says "Running" for over 17 hours and it is still not cancelled automatically; when I cancel it manually, it just stays in "Cancelling" state forever

Marvin

09/23/2025, 10:44 AM

✅ thought for 486.4 seconds

Marvin

09/23/2025, 10:52 AM

Thanks for the details — a few things typically cause exactly what you’re seeing (automations not firing, “Running” for many hours, manual “Cancel” stuck in “Cancelling”): What has to be running for automations and cleanup - Prefect Server services must be running in the same process as your API, not just the webserver. If the server is started with --no-services or services are disabled, automations won’t evaluate and cancellations won’t be finalized. - The key services are: - ProactiveTriggers, ReactiveTriggers, Actions (automation evaluation and execution) - EventPersister (events storage that automations read) - CancellationCleanup (finalizes flow runs stuck in “Cancelling”) - Check inside your container:

Copy code

prefect server services ls
  # If not running, start them:
  prefect server services start

If you start with

prefect server start

, do not pass

--no-services

. Common reasons your automation didn’t cancel the run - Automation scope/match didn’t include your run (e.g., filters on name/tags/workspace). Re-open the automation in the UI and confirm its filters match the flow. - Triggers/Actions services disabled or not running (see above). - No events/heartbeats for the run so “zombie” detection doesn’t kick in. Check the flow run’s details for “last heartbeat” to confirm it stopped. - Using an older container/image where services default to off. Ensure you’re on a recent 3.x image and that env vars enabling services are set as needed. Why “Cancelling” can hang forever - The server sets state=“Cancelling”, then asks the underlying infrastructure to stop the process. If the process is already dead or the worker/infrastructure is gone/offline, no final acknowledgement comes back and the run can sit in “Cancelling”. - The CancellationCleanup service finalizes these runs after a timeout. If it’s disabled, they’ll never be finalized. Verify it’s enabled in your server and that the service loop is running. Quick checks and actions 1) Verify services in your container

Copy code

prefect server services ls

You should see ProactiveTriggers, ReactiveTriggers, Actions, EventPersister, and CancellationCleanup enabled and running. If not, enable/start them and restart the server container. 2) Confirm the server process is actually running services - If you start the API in the background, make sure it was not started with

--no-services

. - If you have multiple server processes/containers, all must have matching configuration. 3) Inspect the stuck run

Copy code

prefect flow-run inspect <FLOW_RUN_ID>

- Confirm state history, last heartbeat, any infrastructure PID/identifier. - If it’s truly orphaned, try:

Copy code

prefect flow-run cancel <FLOW_RUN_ID>

Then give CancellationCleanup a couple of minutes to finalize. If cleanup is disabled, enable it and restart the server services. 4) As a last resort for cleanup - If the process is gone and you need to unclog the UI, you can delete the run:

Copy code

prefect flow-run delete <FLOW_RUN_ID>

If you can share: - How you start the server in the container (command/entrypoint; are you using

--no-services

?) - The output of

prefect server services ls

inside that container - One example

prefect flow-run inspect <id>

for a stuck run …I’ll give you precise steps to fix automations and ensure stuck runs are auto-cancelled. References - Server CLI and services: Prefect Server CLI guide - Cancelling workflows: Cancel workflows - Detecting zombies overview: Detect zombie flows

8 Views

Open in Slack

Previous Next