<@ULVA73B9P> starting prefect-server failed due mi...
# ask-marvin
d
@Marvin starting prefect-server failed due migration failed with aurora:
m
thought for 396.1 seconds
d
Copy code
File "/usr/local/lib/python3.12/site-packages/alembic/util/pyfiles.py", line 136, in load_module_py
    spec.loader.exec_module(module)  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap_external>", line 999, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/usr/local/lib/python3.12/site-packages/prefect/server/database/_migrations/env.py", line 201, in <module>
    run_async_from_worker_thread(apply_migrations)
  File "/usr/local/lib/python3.12/site-packages/prefect/utilities/asyncutils.py", line 254, in run_async_from_worker_thread
    return anyio.from_thread.run(call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/anyio/from_thread.py", line 59, in run
    return async_backend.run_async_from_thread(func, args, token=token)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2510, in run_async_from_thread
    return f.result()
           ^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
TimeoutError
m
Sorry you’re hitting this. Migration failures against Aurora usually come down to one of these: - Missing Postgres extensions (most commonly
pgcrypto
for
gen_random_uuid()
) - Insufficient DB privileges to create extensions / tables / indexes - Misconfigured connection URL (must use the async driver) - TLS requirements not set when Aurora enforces SSL Can you share: - The exact error message/traceback from
prefect server start
(or
prefect server database upgrade
) - Aurora PostgreSQL engine version - Prefect version - How you’re configuring the DB connection (env vars / Helm / docker-compose), including whether TLS is required Quick fixes to try 1) If your error mentions
gen_random_uuid()
or “function does not exist” Run this once on the target database (with a user that can create extensions, e.g. the Aurora master user or a role with
rds_superuser
):
Copy code
CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- optional alternative if your Aurora version lacks pgcrypto:
-- CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
2) Verify your Prefect 3.x DB connection settings Prefect Server needs the async driver in the URL:
Copy code
export PREFECT_SERVER_DATABASE__CONNECTION_URL='postgresql+asyncpg://<user>:<password>@<host>:5432/<db_name>'
Common gotcha: using
postgresql://...
instead of
postgresql+asyncpg://...
. 3) If Aurora requires TLS Set TLS connect args (example with RDS CA bundle):
Copy code
export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__ENABLED=true
export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__CA_FILE=/path/to/rds-ca-bundle.pem
# Optional:
# export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__CHECK_HOSTNAME=true
4) Run migrations explicitly to see the full error
Copy code
# optional: disable auto-migration while you debug
export PREFECT_SERVER_DATABASE__MIGRATE_ON_START=false

# run the migration
prefect server database upgrade

# then start the server
prefect server start
5) Verify privileges for the DB user used by Prefect The user should be able to: - CREATE EXTENSION (for
pgcrypto
) - CREATE SCHEMA / TABLE / INDEX - ALTER TABLE If this is a brand new database and you don’t need to preserve data, you can reset:
Copy code
prefect server database reset
Note: this drops and recreates all Prefect tables. Helpful docs - Self-hosting Prefect (3.x): Self-hosted Prefect - Settings reference (env var mapping with double-underscore nesting): Settings reference If you can paste the exact migration error, I can pinpoint whether it’s an extension, permissions, or connection configuration issue and share the precise command(s) to fix it.
d
@Marvin are you cheked I posted error above it's TimeoutError during migration start
m
thought for 45.5 seconds
Got it — thanks for clarifying. A TimeoutError at migration start almost always means Prefect can’t establish a DB connection in time (before it even runs Alembic). Here are focused steps to fix that with Aurora: 1) Bump connection timeouts Set these and try again — Aurora + asyncpg sometimes needs more than the defaults (5–10s):
Copy code
export PREFECT_SERVER_DATABASE__CONNECTION_TIMEOUT=30     # connect handshake
export PREFECT_SERVER_DATABASE__TIMEOUT=60                # DB op timeout
export PREFECT_SERVER_DATABASE__SQLALCHEMY__POOL_TIMEOUT=60
2) Verify the async driver and writer endpoint - Use the async driver in the URL:
Copy code
export PREFECT_SERVER_DATABASE__CONNECTION_URL='postgresql+asyncpg://<user>:<password>@<writer-endpoint>:5432/<db>'
- Ensure you are using the Aurora writer/cluster endpoint (not a reader endpoint). Migrations require write access. 3) If your Aurora enforces TLS Turn TLS on and provide the CA bundle:
Copy code
export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__ENABLED=true
export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__CA_FILE=/path/to/rds-ca-bundle.pem
# optionally:
# export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__CHECK_HOSTNAME=true
4) Confirm network access From the host running Prefect Server: - Security group/NACL allows outbound to the Aurora port and inbound rules allow your source - DNS resolves and the port is reachable - Quick test:
Copy code
nc -vz <writer-endpoint> 5432
# or try psql with same host/port/ssl settings
5) Run migration separately for clearer errors
Copy code
export PREFECT_SERVER_DATABASE__MIGRATE_ON_START=false
# optional: more logging
export PREFECT_LOGGING_LEVEL=DEBUG

prefect server database upgrade
prefect server start
If it still times out, please share: - Full traceback from
prefect server database upgrade
(redact secrets) - Your Aurora Postgres engine version - Whether TLS is required and how you’re setting the connection URL Refs - Self-hosted Prefect (3.x): Self-hosted Prefect - Settings reference (env var mapping): Settings reference
d
@Marvin I run it locally for db and it passed : export PREFECT_LOGGING_LEVEL=DEBUG prefect server database upgrade prefect server start
m
thought for 62.7 seconds
Great — that means the schema/migrations are fine. If it passes locally but times out elsewhere, the issue is almost always environment-specific: connection URL, TLS, or network reachability from the failing host. Quick checklist to pinpoint it - Are you connecting to the exact same Aurora writer endpoint from both places? - Use
postgresql+asyncpg://...@<writer-endpoint>:5432/<db>
(not a reader endpoint). - Diff your effective settings between the working local run and the failing host:
Copy code
prefect config view | grep -i '^PREFECT_SERVER_DATABASE' -A 20
Key vars to compare: -
PREFECT_SERVER_DATABASE__CONNECTION_URL
-
PREFECT_SERVER_DATABASE__CONNECTION_TIMEOUT
-
PREFECT_SERVER_DATABASE__TIMEOUT
-
PREFECT_SERVER_DATABASE__SQLALCHEMY__POOL_TIMEOUT
- TLS flags (below) - If Aurora enforces SSL, set TLS connect args on the failing host:
Copy code
export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__ENABLED=true
  export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__CA_FILE=/path/to/rds-ca-bundle.pem
  # optional:
  # export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__CHECK_HOSTNAME=true
- Bump timeouts (asyncpg + Aurora can be slow on first connect):
Copy code
export PREFECT_SERVER_DATABASE__CONNECTION_TIMEOUT=30
  export PREFECT_SERVER_DATABASE__TIMEOUT=60
  export PREFECT_SERVER_DATABASE__SQLALCHEMY__POOL_TIMEOUT=60
- Test network reachability from the failing host/pod:
Copy code
nc -vz <writer-endpoint> 5432
  # and, if possible, try psql:
  psql "host=<writer-endpoint> port=5432 dbname=<db> user=<user> sslmode=require"
If these hang or fail, it’s a VPC / SG / NACL / NAT / DNS issue. - If you’re on Kubernetes/ECS: - Ensure the workload is in a subnet/VPC that can reach the Aurora subnets - Security group on Aurora allows inbound from the workload’s SG - If Aurora is private and your pod is in a private subnet, ensure there’s egress (NAT) for DNS/connection Get a clearer error from the failing environment - Disable auto-migration temporarily and run it by hand to see the precise failure:
Copy code
export PREFECT_SERVER_DATABASE__MIGRATE_ON_START=false
  export PREFECT_LOGGING_LEVEL=DEBUG
  export PREFECT_SERVER_DATABASE__ECHO=true

  prefect server database upgrade
Then share the full traceback (redact secrets) if it still times out. Docs for settings and env var names - Settings reference If you confirm you’re using the same connection URL and TLS settings on both hosts, the remaining culprit is almost always networking (SG/NACL/routing/DNS) from the failing host to the Aurora writer endpoint.