< Marvin> starting prefect server failed due migration faile Prefect Community #ask-marvin

Join Slack

<@ULVA73B9P> starting prefect-server failed due mi...

# ask-marvin

Denys Y

08/29/2025, 2:12 PM

@Marvin starting prefect-server failed due migration failed with aurora:

Marvin

08/29/2025, 2:12 PM

✅ thought for 396.1 seconds

Denys Y

08/29/2025, 2:14 PM

Copy code

File "/usr/local/lib/python3.12/site-packages/alembic/util/pyfiles.py", line 136, in load_module_py
    spec.loader.exec_module(module)  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap_external>", line 999, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/usr/local/lib/python3.12/site-packages/prefect/server/database/_migrations/env.py", line 201, in <module>
    run_async_from_worker_thread(apply_migrations)
  File "/usr/local/lib/python3.12/site-packages/prefect/utilities/asyncutils.py", line 254, in run_async_from_worker_thread
    return anyio.from_thread.run(call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/anyio/from_thread.py", line 59, in run
    return async_backend.run_async_from_thread(func, args, token=token)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2510, in run_async_from_thread
    return f.result()
           ^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
TimeoutError

Marvin

08/29/2025, 2:19 PM

Sorry you’re hitting this. Migration failures against Aurora usually come down to one of these: - Missing Postgres extensions (most commonly

pgcrypto

for

gen_random_uuid()

) - Insufficient DB privileges to create extensions / tables / indexes - Misconfigured connection URL (must use the async driver) - TLS requirements not set when Aurora enforces SSL Can you share: - The exact error message/traceback from

prefect server start

(or

prefect server database upgrade

) - Aurora PostgreSQL engine version - Prefect version - How you’re configuring the DB connection (env vars / Helm / docker-compose), including whether TLS is required Quick fixes to try 1) If your error mentions

gen_random_uuid()

or “function does not exist” Run this once on the target database (with a user that can create extensions, e.g. the Aurora master user or a role with

rds_superuser

Copy code

CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- optional alternative if your Aurora version lacks pgcrypto:
-- CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

2) Verify your Prefect 3.x DB connection settings Prefect Server needs the async driver in the URL:

Copy code

export PREFECT_SERVER_DATABASE__CONNECTION_URL='postgresql+asyncpg://<user>:<password>@<host>:5432/<db_name>'

Common gotcha: using

postgresql://...

instead of

postgresql+asyncpg://...

. 3) If Aurora requires TLS Set TLS connect args (example with RDS CA bundle):

Copy code

export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__ENABLED=true
export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__CA_FILE=/path/to/rds-ca-bundle.pem
# Optional:
# export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__CHECK_HOSTNAME=true

4) Run migrations explicitly to see the full error

Copy code

# optional: disable auto-migration while you debug
export PREFECT_SERVER_DATABASE__MIGRATE_ON_START=false

# run the migration
prefect server database upgrade

# then start the server
prefect server start

5) Verify privileges for the DB user used by Prefect The user should be able to: - CREATE EXTENSION (for

pgcrypto

) - CREATE SCHEMA / TABLE / INDEX - ALTER TABLE If this is a brand new database and you don’t need to preserve data, you can reset:

Copy code

prefect server database reset

Note: this drops and recreates all Prefect tables. Helpful docs - Self-hosting Prefect (3.x): Self-hosted Prefect - Settings reference (env var mapping with double-underscore nesting): Settings reference If you can paste the exact migration error, I can pinpoint whether it’s an extension, permissions, or connection configuration issue and share the precise command(s) to fix it.

Denys Y

08/29/2025, 2:40 PM

@Marvin are you cheked I posted error above it's TimeoutError during migration start

Marvin

08/29/2025, 2:40 PM

✅ thought for 45.5 seconds

Marvin

08/29/2025, 2:41 PM

Got it — thanks for clarifying. A TimeoutError at migration start almost always means Prefect can’t establish a DB connection in time (before it even runs Alembic). Here are focused steps to fix that with Aurora: 1) Bump connection timeouts Set these and try again — Aurora + asyncpg sometimes needs more than the defaults (5–10s):

Copy code

export PREFECT_SERVER_DATABASE__CONNECTION_TIMEOUT=30     # connect handshake
export PREFECT_SERVER_DATABASE__TIMEOUT=60                # DB op timeout
export PREFECT_SERVER_DATABASE__SQLALCHEMY__POOL_TIMEOUT=60

2) Verify the async driver and writer endpoint - Use the async driver in the URL:

Copy code

export PREFECT_SERVER_DATABASE__CONNECTION_URL='postgresql+asyncpg://<user>:<password>@<writer-endpoint>:5432/<db>'

- Ensure you are using the Aurora writer/cluster endpoint (not a reader endpoint). Migrations require write access. 3) If your Aurora enforces TLS Turn TLS on and provide the CA bundle:

Copy code

export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__ENABLED=true
export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__CA_FILE=/path/to/rds-ca-bundle.pem
# optionally:
# export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__CHECK_HOSTNAME=true

4) Confirm network access From the host running Prefect Server: - Security group/NACL allows outbound to the Aurora port and inbound rules allow your source - DNS resolves and the port is reachable - Quick test:

Copy code

nc -vz <writer-endpoint> 5432
# or try psql with same host/port/ssl settings

5) Run migration separately for clearer errors

Copy code

export PREFECT_SERVER_DATABASE__MIGRATE_ON_START=false
# optional: more logging
export PREFECT_LOGGING_LEVEL=DEBUG

prefect server database upgrade
prefect server start

If it still times out, please share: - Full traceback from

prefect server database upgrade

(redact secrets) - Your Aurora Postgres engine version - Whether TLS is required and how you’re setting the connection URL Refs - Self-hosted Prefect (3.x): Self-hosted Prefect - Settings reference (env var mapping): Settings reference

Denys Y

08/29/2025, 2:58 PM

@Marvin I run it locally for db and it passed : export PREFECT_LOGGING_LEVEL=DEBUG prefect server database upgrade prefect server start

Marvin

08/29/2025, 2:58 PM

✅ thought for 62.7 seconds

Marvin

08/29/2025, 2:59 PM

Great — that means the schema/migrations are fine. If it passes locally but times out elsewhere, the issue is almost always environment-specific: connection URL, TLS, or network reachability from the failing host. Quick checklist to pinpoint it - Are you connecting to the exact same Aurora writer endpoint from both places? - Use

postgresql+asyncpg://...@<writer-endpoint>:5432/<db>

(not a reader endpoint). - Diff your effective settings between the working local run and the failing host:

Copy code

prefect config view | grep -i '^PREFECT_SERVER_DATABASE' -A 20

Key vars to compare: -

PREFECT_SERVER_DATABASE__CONNECTION_URL

PREFECT_SERVER_DATABASE__CONNECTION_TIMEOUT

PREFECT_SERVER_DATABASE__TIMEOUT

PREFECT_SERVER_DATABASE__SQLALCHEMY__POOL_TIMEOUT

- TLS flags (below) - If Aurora enforces SSL, set TLS connect args on the failing host:

Copy code

export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__ENABLED=true
  export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__CA_FILE=/path/to/rds-ca-bundle.pem
  # optional:
  # export PREFECT_SERVER_DATABASE__SQLALCHEMY__CONNECT_ARGS__TLS__CHECK_HOSTNAME=true

- Bump timeouts (asyncpg + Aurora can be slow on first connect):

Copy code

export PREFECT_SERVER_DATABASE__CONNECTION_TIMEOUT=30
  export PREFECT_SERVER_DATABASE__TIMEOUT=60
  export PREFECT_SERVER_DATABASE__SQLALCHEMY__POOL_TIMEOUT=60

- Test network reachability from the failing host/pod:

Copy code

nc -vz <writer-endpoint> 5432
  # and, if possible, try psql:
  psql "host=<writer-endpoint> port=5432 dbname=<db> user=<user> sslmode=require"

If these hang or fail, it’s a VPC / SG / NACL / NAT / DNS issue. - If you’re on Kubernetes/ECS: - Ensure the workload is in a subnet/VPC that can reach the Aurora subnets - Security group on Aurora allows inbound from the workload’s SG - If Aurora is private and your pod is in a private subnet, ensure there’s egress (NAT) for DNS/connection Get a clearer error from the failing environment - Disable auto-migration temporarily and run it by hand to see the precise failure:

Copy code

export PREFECT_SERVER_DATABASE__MIGRATE_ON_START=false
  export PREFECT_LOGGING_LEVEL=DEBUG
  export PREFECT_SERVER_DATABASE__ECHO=true

  prefect server database upgrade

Then share the full traceback (redact secrets) if it still times out. Docs for settings and env var names - Settings reference If you confirm you’re using the same connection URL and TLS settings on both hosts, the remaining culprit is almost always networking (SG/NACL/routing/DNS) from the failing host to the Aurora writer endpoint.

11 Views

Open in Slack

Previous Next