Pavel Pustovoyt
09/29/2025, 1:47 PMasyncio.exceptions.TimeoutError: timed out during opening handshake
when starting multiple deployments from a flow:
2025-09-29T15:56:43.043+03:00 12:56:43.042 | DEBUG | prefect.events.clients - Reconnecting...
12:56:43.042 | DEBUG | prefect.events.clients - Reconnecting...
2025-09-29T15:56:53.056+03:00 Traceback (most recent call last):
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/websockets/asyncio/client.py", line 541, in __await_impl__
2025-09-29T15:56:53.056+03:00 self.connection = await self.create_connection()
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/websockets/asyncio/client.py", line 467, in create_connection
2025-09-29T15:56:53.056+03:00 _, connection = await loop.create_connection(factory, **kwargs)
2025-09-29T15:56:53.056+03:00 File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1050, in create_connection
2025-09-29T15:56:53.056+03:00 sock = await self._connect_sock(
2025-09-29T15:56:53.056+03:00 File "/usr/local/lib/python3.9/asyncio/base_events.py", line 961, in _connect_sock
2025-09-29T15:56:53.056+03:00 await self.sock_connect(sock, address)
2025-09-29T15:56:53.056+03:00 File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 500, in sock_connect
2025-09-29T15:56:53.056+03:00 return await fut
2025-09-29T15:56:53.056+03:00 asyncio.exceptions.CancelledError
2025-09-29T15:56:53.056+03:00 During handling of the above exception, another exception occurred:
2025-09-29T15:56:53.056+03:00 Traceback (most recent call last):
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/websockets/asyncio/client.py", line 574, in __await_impl__
2025-09-29T15:56:53.056+03:00 raise SecurityError(f"more than {MAX_REDIRECTS} redirects")
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/websockets/asyncio/async_timeout.py", line 181, in __aexit__
2025-09-29T15:56:53.056+03:00 self._do_exit(exc_type)
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/websockets/asyncio/async_timeout.py", line 268, in _do_exit
2025-09-29T15:56:53.056+03:00 raise asyncio.TimeoutError
Marvin
09/29/2025, 1:47 PMasyncio exceptions TimeoutError timed out during opening handshake
when starting multiple deployments from a flow ```2025-09-29T15 56 43 043+03 00 12 56 43 042 | DEBUG | prefect events clients - Reconnecting 12 56 43 042 | DEBUG | prefect events clients - Reconnecting 2025-09-29T15 56 53 056+03 00 Traceback (most recent call last) 2025-09-29T15 56 53 056+03 00 File /app/venv_deps/lib/python3 9/site-packages/websockets/asyncio/client py line 541 in __await_impl__ 2025-09-29T15 56 53 056+03 00 self connection = await self create_connection() 2025-09-29T15 56 53 056+03 00 File /app/venv_deps/lib/python3 9/site-packages/websockets/asyncio/client py line 467 in create_connection 2025-09-29T15 56 53 056+03 00 _ connection = await loop create_connection(factory *kwargs) 2025-09-29T15 56 53 056+03 00 File /usr/local/lib/python3 9/asyncio/base_events py line 1050 in create_connection 2025-09-29T15 56 53 056+03 00 sock = await self _connect_sock( 2025-09-29T15 56 53 056+03 00 File /usr/local/lib/python3 9/asyncio/base_events py line 961 in _connect_sock 2025-09-29T15 56 53 056+03 00 await self sock_connect(sock address) 2025-09-29T15 56 53 056+03 00 File /usr/local/lib/python3 9/asyncio/selector_events py line 500 in sock_connect 2025-09-29T15 56 53 056+03 00 return await fut 2025-09-29T15 56 53 056+03 00 asyncio exceptions CancelledError 2025-09-29T15 56 53 056+03 00 During handling of the above exception another exception occurred 2025-09-29T15 56 53 056+03 00 Traceback (most recent call last) 2025-09-29T15 56 53 056+03 00 File /app/venv_deps/lib/python3 9/site-packages/websockets/asyncio/client py line 574 in __await_impl__ 2025-09-29T15 56 53 056+03 00 raise SecurityError(f more than {MAX_REDIRECTS} redirects ) 2025-09-29T15 56 53 056+03 00 File /app/venv_deps/lib/python3 9/site-packages/websockets/asyncio/async_timeout py line 181 in aexit 2025-09-29T15 56 53 056+03 00 self _do_exit(exc_type) 2025-09-29T15 56 53 056+03 00 File /app/venv_deps/lib/python3 9/site-packages/websockets/asyncio/async_timeout py line 268 in _do_exit 2025-09-29T15 56 53 056+03 00 raise asyncio TimeoutError 2025-09-29T15 56 53 056+03 00 asyncio exceptions TimeoutError 2025-09-29T15 56 53 056+03 00 The above exception was the direct cause of the following exception 2025-09-29T15 56 53 056+03 00 Traceback (most recent call last) 2025-09-29T15 56 53 056+03 00 File /app/venv_deps/lib/python3 9/site-packages/prefect/cli/_utilities py line 44 in wrapper 2025-09-29T15 56 53 056+03 00 return fn(*args *kwargs) 2025-09-29T15 56 53 056+03 00 File /app/venv_deps/lib/python3 9/site-packages/prefect/cli/_types py line 156 in sync_fn 2025-09-29T15 56 53 056+03 00 return asyncio run(async_fn(*args **kwargs)) 2025-09-29T15 56 53 056+03 00 File /usr/local/lib/python3 9/asyncio/runners py line 44 in run 2025-09-29T15 56 53 056+03 00 return loop run_until_complete(main) 2025-09-29T15 56 53 056+03 00 File /usr/local/lib/python3 9/asyncio/base_events py line 647 in run_until_complete 2025-09-29T15 56 53 056+03 00 return future result() 2025-09-29T15 56 53 056+03 00 File /app/venv_deps/lib/python3 9/site-packages/prefect/cli/flow_run py line 412 in execute 2025-09-29T15 56 53 056+03 00 await runner execute_flow_run(id) 2025-09-29T15 56 53 056+03 00 File /app/venv_deps/lib/python3 9/site-packages/prefect/runner/runner py line 571 in execute_flow_run 2025-09-29T15 56 53 056+03 00 async with context 2025-09-29T15 56 53 056+03 00 File /app/venv_deps/lib/python3 9/site-packages/prefect/runner/runner py line 1499 in aenter 2025-09-29T15 56 53 056+03 00 await self _exit_stack enter_async_context( 2025-09-29T15 56 53 056+03 00 File /usr/local/lib/python3 9/contextlib py line 575 in enter_async_context 2025-09-29T15 56 53 056+03 00 result = await _cm_type __aenter__(cm) 2025-09-29T15 56 53 056+03 00 File /app/venv_deps/lib/python3 9/site-packages/prefect/runner/_observers py line 42 in aenterMarvin
09/29/2025, 1:47 PMMarvin
09/29/2025, 1:48 PMPavel Pustovoyt
09/29/2025, 1:48 PM2025-09-29T15:56:53.056+03:00 asyncio.exceptions.TimeoutError
2025-09-29T15:56:53.056+03:00 The above exception was the direct cause of the following exception:
2025-09-29T15:56:53.056+03:00 Traceback (most recent call last):
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/prefect/cli/_utilities.py", line 44, in wrapper
2025-09-29T15:56:53.056+03:00 return fn(*args, **kwargs)
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/prefect/cli/_types.py", line 156, in sync_fn
2025-09-29T15:56:53.056+03:00 return asyncio.run(async_fn(*args, **kwargs))
2025-09-29T15:56:53.056+03:00 File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
2025-09-29T15:56:53.056+03:00 return loop.run_until_complete(main)
2025-09-29T15:56:53.056+03:00 File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
2025-09-29T15:56:53.056+03:00 return future.result()
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/prefect/cli/flow_run.py", line 412, in execute
2025-09-29T15:56:53.056+03:00 await runner.execute_flow_run(id)
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/prefect/runner/runner.py", line 571, in execute_flow_run
2025-09-29T15:56:53.056+03:00 async with context:
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/prefect/runner/runner.py", line 1499, in __aenter__
2025-09-29T15:56:53.056+03:00 await self._exit_stack.enter_async_context(
2025-09-29T15:56:53.056+03:00 File "/usr/local/lib/python3.9/contextlib.py", line 575, in enter_async_context
2025-09-29T15:56:53.056+03:00 result = await _cm_type.__aenter__(cm)
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/prefect/runner/_observers.py", line 42, in __aenter__
2025-09-29T15:56:53.056+03:00 self._events_subscriber = await self._exit_stack.enter_async_context(
2025-09-29T15:56:53.056+03:00 File "/usr/local/lib/python3.9/contextlib.py", line 575, in enter_async_context
2025-09-29T15:56:53.056+03:00 result = await _cm_type.__aenter__(cm)
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/prefect/events/clients.py", line 555, in __aenter__
2025-09-29T15:56:53.056+03:00 await self._reconnect()
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/prefect/events/clients.py", line 566, in _reconnect
2025-09-29T15:56:53.056+03:00 self._websocket = await self._connect.__aenter__()
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/websockets/asyncio/client.py", line 587, in __aenter__
2025-09-29T15:56:53.056+03:00 return await self
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/prefect/_internal/websockets.py", line 135, in _proxy_connect
2025-09-29T15:56:53.056+03:00 proto = await self.__await_impl__()
2025-09-29T15:56:53.056+03:00 File "/app/venv_deps/lib/python3.9/site-packages/websockets/asyncio/client.py", line 578, in __await_impl__
2025-09-29T15:56:53.056+03:00 raise TimeoutError("timed out during opening handshake") from exc
2025-09-29T15:56:53.056+03:00 asyncio.exceptions.TimeoutError: timed out during opening handshake
2025-09-29T15:56:53.057+03:00 An exception occurred.
Marvin
09/29/2025, 1:53 PMPREFECT_API_URL
includes the scheme and is correct (e.g. `https://api.prefect.cloud/api/accounts/<account_id>/workspaces/<ws_id>`).
- If you’re behind a proxy or egress filter, verify that outbound WSS (port 443) to Prefect’s API is allowed. Some corporate proxies/firewalls block websocket upgrades which will cause opening-handshake timeouts.
- If you have `HTTPS_PROXY`/`HTTP_PROXY` env vars set, try a run without them to see if the proxy is interfering with WSS.
- Docs: Troubleshoot Prefect Cloud, Settings reference
2) Reduce the number of simultaneous connections
If you’re starting many deployments at once from a parent flow, bursty fan-out can cause lots of concurrent WebSocket dials. Try throttling to a small concurrency (e.g., 5) or running sequentially.
Example — launch multiple deployments with a small concurrency:
from prefect import flow
from prefect.deployments import run_deployment
import anyio
@flow
async def launch_many(names: list[str]):
sem = anyio.Semaphore(5) # limit concurrency
async def launch_one(name: str):
async with sem:
# If you don't need to wait for each child run, omit waiting logic
return await run_deployment(name=name)
# schedule with throttling
return await anyio.gather(*[launch_one(n) for n in names])
# names = ["flow-a/dev", "flow-b/dev", ...]
# launch_many(names)
API ref for `run_deployment`: run_deployment
3) Turn down the noise (if runs are otherwise fine)
- Set PREFECT_LOGGING_LEVEL=INFO
(or WARN) to hide the debug “Reconnecting…” spam.
- You can inspect your current settings with:
prefect config view --show-defaults
4) Environment specifics that commonly trigger this
- Running inside containers behind a corporate proxy that does not support WebSocket upgrade
- Tight egress rules that only allow plain HTTPS but not WS/WSS
- Very large fan-out (dozens/hundreds of concurrent child deployments) causing transient connection pressure
A few questions to help pinpoint the root cause
- Are you on Prefect Cloud or self-hosted Prefect Server?
- Which Prefect version are you running?
- Roughly how many deployments do you start concurrently from the flow?
- Are you behind a proxy or strict firewall/NAT?
- Do the deployments themselves succeed and just log these timeouts, or are any runs failing?
If you share those details (and a minimal snippet of how you’re launching the deployments), I can suggest the most targeted fix.Pavel Pustovoyt
09/29/2025, 2:06 PMMarvin
09/29/2025, 2:08 PMMarvin
09/29/2025, 2:10 PMproxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 600s;
proxy_send_timeout 600s;
- Traefik/ALB/Istio
- Confirm WS upgrade is allowed and idle/read timeouts are >= 60–300s
- For ALB, ensure HTTP/1.1 to the target and increase idle timeout (> 60s)
- If you terminate TLS upstream, ensure the upstream hop is still HTTP/1.1 (pure HTTP/2 upstream can break WS upgrades)
B) Scale API capacity and connection limits
- Increase API replicas (if running the API behind a load balancer)
- Ensure sufficient CPU/memory for the API pod/container
- Raise proxy/server connection limits (e.g., NGINX worker_connections, ulimit -n on hosts)
C) Reduce client‑side burstiness and add retries
Throttle the number of simultaneous run_deployment
calls and add a retry around them. This both eases the WS handshake burst and makes the orchestration robust if one WS dial transiently fails.
Example limiting concurrency to 5 and turning off subflow semantics (to reduce coupling) with retries:
from prefect import flow
from prefect.deployments import run_deployment
import anyio
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=0.5, max=10), reraise=True)
async def run_one(name, params=None):
# as_subflow=False reduces “child as subflow” linkage; often lowers extra coupling
return await run_deployment(name=name, parameters=params, as_subflow=False)
@flow
async def launch_many(names: list[str]):
sem = anyio.Semaphore(5) # tune this upward once ingress is fixed
async def launch(name: str):
async with sem:
try:
return await run_one(name)
except Exception as exc:
# log and re-raise if you want the parent flow to fail, or swallow if you prefer partial success
raise
return await anyio.gather(*[launch(n) for n in names])
API reference: run_deployment
D) If you don’t need event streaming, disable it (server‑side)
- On the server, set PREFECT_API_EVENTS_STREAM_OUT_ENABLED=false
to stop serving the events stream. This disables live event streaming/automations from the server; the UI will still show state via polling, but live streams won’t be available.
- Settings reference: Settings reference
- Note: do this only if you’re comfortable without live event stream/automations during high‑concurrency bursts.Marvin
09/29/2025, 2:10 PMPREFECT_API_URL
should include the scheme and be reachable (http/https)
- Avoid proxies that block WS upgrades; unset HTTP(S)_PROXY
for the Prefect processes if necessary
- You can inspect effective settings with:
prefect config view --show-defaults
Next step — a bit of context will let me be more specific
- How many deployments are you launching at peak (and from how many processes/pods)?
- What ingress/proxy are you using (NGINX, Traefik, ALB, Istio, etc.)?
- Are the failures clustered at high concurrency or seemingly random?
- If you have NGINX/ingress config, can you share the WS-related snippet?
With those details, I can give exact ingress settings and a safe concurrency target to start with, then we can tune up from there.Pavel Pustovoyt
09/29/2025, 8:20 PMMarvin
09/29/2025, 8:20 PMMarvin
09/29/2025, 8:21 PMrun_deployment
calls and ~10% failing, this almost always points to your self-hosted API/ingress not handling bursts of WebSocket upgrades for the events channel.
What’s likely the bottleneck
- Your ingress/proxy (NGINX/Traefik/ALB/etc.) is either:
- Not fully configured for WebSocket upgrades (Upgrade/Connection headers)
- Using short read/connect/idle timeouts
- Hitting connection/worker limits under burst
- The Prefect API instance(s) may not have enough capacity for the spike.
- The client opens/maintains an events WebSocket; under load, some handshakes time out and bubble up, failing the flow.
How to raise concurrency without the timeouts
Server/ingress hardening (most important)
- Ensure WS upgrade is enabled and timeouts are high enough. For NGINX, add/verify:
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_connect_timeout 60s;
proxy_read_timeout 600s;
proxy_send_timeout 600s;
keepalive_timeout 75s;
- If behind a load balancer/service mesh:
- ALB/NLB: idle timeout ≥ 60s; ensure HTTP/1.1 on the upstream target group.
- Traefik/Istio: enable WS, increase read/idle timeouts.
- Increase capacity:
- Scale API replicas (multiple server pods/containers behind LB).
- Raise proxy connection limits (e.g., NGINX worker_connections
), and ensure sufficient CPU/memory on API pods.
- Optional but helpful: enable sticky sessions (some proxies handle a flood of WS better when the connection stays on one upstream).
Client-side throttling and resiliency
- Throttle the fan-out and add retries. Start with concurrency 10–20 while you adjust ingress, then scale up.
- Set as_subflow=False
to decouple child runs from the parent flow’s state.
- Wrap run_deployment
in retries with exponential backoff.
Example:
from prefect import flow
from prefect.deployments import run_deployment
import anyio
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=0.5, max=10), reraise=True)
async def run_one(name, params=None):
return await run_deployment(name=name, parameters=params, as_subflow=False)
@flow
async def launch_many(names: list[str]):
sem = anyio.Semaphore(15) # tune up after ingress fixes (e.g., 20, 30, ...)
async def launch(n):
async with sem:
return await run_one(n)
return await anyio.gather(*[launch(n) for n in names])
Upgrade to latest 3.4.x
- You’re on 3.4.8. We’ve shipped stability improvements across 3.4.x. It’s worth upgrading to the latest 3.4.x to pick up resiliency fixes.
- Release notes: 3.4.20 changelog
Optional: temporarily disable events streaming (server)
- If you need an immediate stopgap and can live without live event streams, set this on the server to reduce WS traffic:
PREFECT_API_EVENTS_STREAM_OUT_ENABLED=false
- Trade-off: live event streaming/automations are affected; UI continues to work via polling. See settings: Settings reference.
Sanity test for your ingress
- From a pod/host in the same network, try a direct WS connect to your API:
# events out stream (read)
wscat -c wss://<your-api-host>/events/out
# events in stream (emit)
wscat -c wss://<your-api-host>/events/in
If these hang or fail intermittently under load, it confirms the ingress/LB path.
Helpful references
- run_deployment
API: run_deployment
- Settings reference: Prefect 3 settings
If you can share:
- Which ingress/LB you’re using (NGINX, Traefik, ALB, Istio, etc.)
- Any relevant ingress config for the Prefect API routeMarvin
09/29/2025, 8:21 PM