<@ULVA73B9P> :exclamation: Issue Summary: The outa...
# ask-community
r
@Marvin Issue Summary: The outage was caused by an unexpected use of an HTTP/2 protocol feature (
RECV_PING
) on a closed connection
, which is invalid according to the HTTP/2 protocol specification. At the core, the *`httpx` client*—used internally by *Prefect's Cloud API client*—attempted to send a PING frame (
RECV_PING
) on an HTTP/2 connection that had already transitioned to a
CLOSED
state
. This caused the underlying
httpcore
transport layer
to raise a
LocalProtocolError
due to the mismatch in expected connection state. 🧾 Root Cause Analysis (RCA): • The system attempted to reuse an HTTP/2 connection that had already been closed, leading to an invalid protocol state. • This is most likely due to a stale connection being kept in the async connection pool or a bug in the connection lifecycle management by
httpcore
. • Prefect’s cloud client attempted communication over this dead connection, resulting in failure.
m
thought for 504.0 seconds
Thanks for the clear RCA — your analysis lines up with what we’ve seen when an HTTP/2 connection in the pool is closed server-side and a PING comes in after the client has transitioned that connection to CLOSED, triggering httpcore’s LocalProtocolError. Here’s what we know and what you can do: What Prefect does by default (Prefect 3.x) - HTTP/2 is disabled by default in Prefect 3.x. The setting is
api.enable_http2
(env:
PREFECT_API_ENABLE_HTTP2
) and defaults to
false
. See settings reference: Settings Reference - The client accepts
httpx_settings
so you can override httpx/httpcore behavior (e.g., http2 flag, keepalive limits). API docs: Client API reference Known reports - We found a matching stack trace reported in Prefect 2.x (HTTP/2): prefecthq/prefect#10149. It’s upstream in httpx/httpcore behavior, not Prefect-specific. We did not find a specific upstream httpx/httpcore issue tied directly to RECV_PING on CLOSED, but the symptom is consistent with stale HTTP/2 connections being reused. Recommended mitigations and next steps 1) If HTTP/2 is enabled anywhere, turn it off and retry - In most cases this fully avoids the issue because HTTP/1.1 does not multiplex and avoids this particular PING/state race. Via config:
Copy code
prefect config set api.enable_http2=false
prefect config view | grep -i http2
Or in code (async):
Copy code
from prefect.client.orchestration import get_client

async with get_client(httpx_settings={"http2": False}) as client:
    ...
Or in code (sync):
Copy code
from prefect.client.orchestration import get_client

with get_client(httpx_settings={"http2": False}, sync_client=True) as client:
    ...
2) If you must keep HTTP/2, reduce connection reuse or expiry - This minimizes the chance of reusing a stale HTTP/2 connection. You can either reduce keepalive expiry or disable keepalive reuse. Example (async) disabling keepalive reuse:
Copy code
import httpx
from prefect.client.orchestration import get_client

limits = httpx.Limits(max_keepalive_connections=0)  # do not reuse keepalives
async with get_client(httpx_settings={"http2": True, "limits": limits}) as client:
    ...
Example (async) shortening expiry:
Copy code
import httpx
from prefect.client.orchestration import get_client

limits = httpx.Limits(keepalive_expiry=30)  # seconds
async with get_client(httpx_settings={"http2": True, "limits": limits}) as client:
    ...
3) Ensure you’re on the latest Prefect 3.x - Upgrading Prefect brings updated httpx/httpcore pins. While we don’t have a single “known good” httpx/httpcore version to call out for this specific case, staying current helps avoid already-fixed pooling/HTTP/2 bugs. 4) Increase client retries for transient failures - Prefect’s client retries transient errors. You can tune these if needed: -
PREFECT_CLIENT_MAX_RETRIES
-
PREFECT_CLIENT_RETRY_EXTRA_CODES
Docs: Settings Reference 5) Check any intermediary timeouts - Load balancers/proxies that aggressively recycle HTTP/2 connections (idle timeouts, keepalive policies) can increase the frequency of this error. If possible, raise idle timeout or align it with the client’s keepalive expiry. Helpful references - Prefect settings reference: Settings Reference - Cloud troubleshooting: Troubleshoot Prefect Cloud - Example of httpcore protocol error symptoms discussed in Prefect context: Helm (advanced) Can you share a bit more so we can tailor the fix? - Prefect version (2.x or 3.x)? - Is
api.enable_http2
currently true? (
prefect config view | grep -i http2
) - httpx/httpcore versions in your environment: ```
python -c "import httpx, httpcore; print('httpx=', httpx.version, 'httpcore=', httpcore.version)" ``` - Any LB/proxy in front of the API (and its idle timeout)? If you can temporarily force HTTP/1.1 and confirm the errors disappear, that will strongly indicate the stale HTTP/2 connection reuse path was the cause.