< Marvin> exclamation Issue Summary The outage was caused by Prefect Community #ask-community

<@ULVA73B9P> :exclamation: Issue Summary: The outa...

raj

11/05/2025, 12:00 PM

@Marvin ❗ Issue Summary: The outage was caused by an unexpected use of an HTTP/2 protocol feature (
RECV_PING
) on a closed connection, which is invalid according to the HTTP/2 protocol specification. At the core, the *`httpx` client*—used internally by *Prefect's Cloud API client*—attempted to send a PING frame (

RECV_PING

) on an HTTP/2 connection that had already transitioned to a CLOSED
state. This caused the underlying httpcore
transport layer to raise a

LocalProtocolError

due to the mismatch in expected connection state. 🧾 Root Cause Analysis (RCA): • The system attempted to reuse an HTTP/2 connection that had already been closed, leading to an invalid protocol state. • This is most likely due to a stale connection being kept in the async connection pool or a bug in the connection lifecycle management by

httpcore

. • Prefect’s cloud client attempted communication over this dead connection, resulting in failure.

Marvin

11/05/2025, 12:00 PM

✅ thought for 504.0 seconds

Marvin

11/05/2025, 12:09 PM

Thanks for the clear RCA — your analysis lines up with what we’ve seen when an HTTP/2 connection in the pool is closed server-side and a PING comes in after the client has transitioned that connection to CLOSED, triggering httpcore’s LocalProtocolError. Here’s what we know and what you can do: What Prefect does by default (Prefect 3.x) - HTTP/2 is disabled by default in Prefect 3.x. The setting is

api.enable_http2

(env:

PREFECT_API_ENABLE_HTTP2

) and defaults to

false

. See settings reference: Settings Reference - The client accepts

httpx_settings

so you can override httpx/httpcore behavior (e.g., http2 flag, keepalive limits). API docs: Client API reference Known reports - We found a matching stack trace reported in Prefect 2.x (HTTP/2): prefecthq/prefect#10149. It’s upstream in httpx/httpcore behavior, not Prefect-specific. We did not find a specific upstream httpx/httpcore issue tied directly to RECV_PING on CLOSED, but the symptom is consistent with stale HTTP/2 connections being reused. Recommended mitigations and next steps 1) If HTTP/2 is enabled anywhere, turn it off and retry - In most cases this fully avoids the issue because HTTP/1.1 does not multiplex and avoids this particular PING/state race. Via config:

Copy code

prefect config set api.enable_http2=false
prefect config view | grep -i http2

Or in code (async):

Copy code

from prefect.client.orchestration import get_client

async with get_client(httpx_settings={"http2": False}) as client:
    ...

Or in code (sync):

Copy code

from prefect.client.orchestration import get_client

with get_client(httpx_settings={"http2": False}, sync_client=True) as client:
    ...

2) If you must keep HTTP/2, reduce connection reuse or expiry - This minimizes the chance of reusing a stale HTTP/2 connection. You can either reduce keepalive expiry or disable keepalive reuse. Example (async) disabling keepalive reuse:

Copy code

import httpx
from prefect.client.orchestration import get_client

limits = httpx.Limits(max_keepalive_connections=0)  # do not reuse keepalives
async with get_client(httpx_settings={"http2": True, "limits": limits}) as client:
    ...

Example (async) shortening expiry:

Copy code

import httpx
from prefect.client.orchestration import get_client

limits = httpx.Limits(keepalive_expiry=30)  # seconds
async with get_client(httpx_settings={"http2": True, "limits": limits}) as client:
    ...

3) Ensure you’re on the latest Prefect 3.x - Upgrading Prefect brings updated httpx/httpcore pins. While we don’t have a single “known good” httpx/httpcore version to call out for this specific case, staying current helps avoid already-fixed pooling/HTTP/2 bugs. 4) Increase client retries for transient failures - Prefect’s client retries transient errors. You can tune these if needed: -

PREFECT_CLIENT_MAX_RETRIES

PREFECT_CLIENT_RETRY_EXTRA_CODES

Docs: Settings Reference 5) Check any intermediary timeouts - Load balancers/proxies that aggressively recycle HTTP/2 connections (idle timeouts, keepalive policies) can increase the frequency of this error. If possible, raise idle timeout or align it with the client’s keepalive expiry. Helpful references - Prefect settings reference: Settings Reference - Cloud troubleshooting: Troubleshoot Prefect Cloud - Example of httpcore protocol error symptoms discussed in Prefect context: Helm (advanced) Can you share a bit more so we can tailor the fix? - Prefect version (2.x or 3.x)? - Is

api.enable_http2

currently true? (

prefect config view | grep -i http2

) - httpx/httpcore versions in your environment: ```

Marvin

11/05/2025, 12:09 PM

python -c "import httpx, httpcore; print('httpx=', httpx.version, 'httpcore=', httpcore.version)" ``` - Any LB/proxy in front of the API (and its idle timeout)? If you can temporarily force HTTP/1.1 and confirm the errors disappear, that will strongly indicate the stale HTTP/2 connection reuse path was the cause.

2 Views

Open in Slack

Previous Next