Owen Boyd
09/13/2025, 8:20 PMOwen Boyd
09/13/2025, 8:20 PMMarvin
09/13/2025, 8:20 PMOwen Boyd
09/13/2025, 8:22 PMFile "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpx/_client.py", line 1014, in _send_single_request [0/899]
response = transport.handle_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpx/_transports/default.py", line 250, in handle_request
resp = self._pool.handle_request(req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 256, in handle_request
raise exc from None
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 236, in handle_request
response = connection.handle_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/connection.py", line 103, in handle_request
return self._connection.handle_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 136, in handle_request
raise exc
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 106, in handle_request
) = self._receive_response_headers(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 177, in _receive_response_headers
event = self._receive_event(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 217, in _receive_event
data = self._network_stream.read(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_backends/sync.py", line 128, in read
return self._sock.recv(max_bytes)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/ssl.py", line 1296, in recv
return self.read(buflen)
^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/ssl.py", line 1169, in read
return self._sslobj.read(len)
^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
^CException ignored in: <module 'threading' from '/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py'>
Traceback (most recent call last):
File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 1553, in _shutdown
atexit_call()
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/services.py", line 73, in _at_exit
self.drain(at_exit=True)
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/services.py", line 208, in drain
return future.result()
^^^^^^^^^^^^^^^
File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/concurrent/futures/_base.py", line 451, in result
self._condition.wait(timeout)
File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 320, in wait
waiter.acquire()
KeyboardInterrupt:
^CException ignored in atexit callback: <bound method TracerProvider.shutdown of <opentelemetry.sdk.trace.TracerProvider object at 0x118d05f10>>
Traceback (most recent call last):
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/opentelemetry/sdk/trace/__init__.py", line 1281, in shutdown
self._active_span_processor.shutdown()
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/opentelemetry/sdk/trace/__init__.py", line 180, in shutdown
sp.shutdown()
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/prefect/telemetry/processors.py", line 56, in shutdown
self._export_thread.join()
File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 1112, in join
self._wait_for_tstate_lock()
File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 1132, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt:
Marvin
09/13/2025, 8:23 PMtime.sleep
, or subprocesses?
- Do you have a snippet of the exact “tstate lock” log line and where it’s printed (worker logs vs. container stdout)?
Concrete things to try:
1) Turn on debug logging to see cancellation/timeout behavior:
export PREFECT_LOGGING_LEVEL=DEBUG
Then rerun and look for logs like “TaskRunTimeoutError”, “cancellation requested”, etc.
2) Make the task cancellation-friendly:
- Add timeouts to any external calls (e.g., requests.get(..., timeout=30)
, database client timeouts, boto3 config timeouts).
- If you control the loop, occasionally perform non-blocking waits (async: use await anyio.sleep(0)
) so cancellation can be noticed. Avoid time.sleep
in async tasks; prefer anyio.sleep
.
- If you spin up your own threads in task code, ensure you shut them down/join them on cancellation, or make them daemon threads.
3) Consider isolating blocking or CPU-bound work:
- Switch the flow to a ProcessPoolTaskRunner so a stuck task won’t hold the main interpreter’s threads:
from prefect import flow, task
from prefect.task_runners import ProcessPoolTaskRunner
@task(timeout_seconds=60, retries=1, retry_delay_seconds=10)
def do_work():
...
@flow(task_runner=ProcessPoolTaskRunner(max_workers=2))
def my_flow():
do_work.submit()
Processes can be terminated by the OS if needed, which avoids the “tstate lock” on interpreter shutdown that you see with threads.
- Alternatively, run the heavy bit in a subprocess you control and set a strict timeout so you can kill it if needed:
import subprocess
@task(timeout_seconds=70)
def do_work():
try:
subprocess.run(["python", "-c", "import time; time.sleep(999)"], timeout=60, check=False)
except subprocess.TimeoutExpired:
# clean up, then raise so Prefect records a failure/timeout
raise
4) Capture a thread dump when things hang to see what’s blocking:
- Enable Python faulthandler and dump all thread stacks on demand:
import faulthandler, sys, signal
faulthandler.enable()
faulthandler.register(signal.SIGUSR2, all_threads=True, chain=True)
Then send SIGUSR2 to the process to get a stack dump in logs. That will tell us exactly which library call is stuck.
5) Upgrade to the latest Prefect 3.x if you’re not already — there have been improvements in cancellation and logging:
- Release notes: Prefect 3 release notes
- Background on why cancellations are cooperative (AnyIO): How AnyIO powers Prefect’s async architecture
If you can share:Marvin
09/13/2025, 8:23 PMOwen Boyd
09/13/2025, 8:27 PMFile "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpx/_client.py", line 1014, in _send_single_request [0/899]
response = transport.handle_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpx/_transports/default.py", line 250, in handle_request
resp = self._pool.handle_request(req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 256, in handle_request
raise exc from None
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 236, in handle_request
response = connection.handle_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/connection.py", line 103, in handle_request
return self._connection.handle_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 136, in handle_request
raise exc
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 106, in handle_request
) = self._receive_response_headers(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 177, in _receive_response_headers
event = self._receive_event(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 217, in _receive_event
data = self._network_stream.read(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_backends/sync.py", line 128, in read
return self._sock.recv(max_bytes)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/ssl.py", line 1296, in recv
return self.read(buflen)
^^^^^^^^^^^^^^^^^
File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/ssl.py", line 1169, in read
return self._sslobj.read(len)
^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
^CException ignored in: <module 'threading' from '/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py'>
Traceback (most recent call last):
File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 1553, in _shutdown
atexit_call()
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/services.py", line 73, in _at_exit
self.drain(at_exit=True)
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/services.py", line 208, in drain
return future.result()
^^^^^^^^^^^^^^^
File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/concurrent/futures/_base.py", line 451, in result
self._condition.wait(timeout)
File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 320, in wait
waiter.acquire()
KeyboardInterrupt:
^CException ignored in atexit callback: <bound method TracerProvider.shutdown of <opentelemetry.sdk.trace.TracerProvider object at 0x118d05f10>>
Traceback (most recent call last):
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/opentelemetry/sdk/trace/__init__.py", line 1281, in shutdown
self._active_span_processor.shutdown()
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/opentelemetry/sdk/trace/__init__.py", line 180, in shutdown
sp.shutdown()
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/prefect/telemetry/processors.py", line 56, in shutdown
self._export_thread.join()
File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 1112, in join
self._wait_for_tstate_lock()
File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 1132, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt:
Marvin
09/13/2025, 8:27 PMMarvin
09/13/2025, 8:32 PM_wait_for_tstate_lock
), and Ctrl-C turns it into the noisy “tstate lock” spew.
What’s happening
- On shutdown (or Ctrl‑C), Prefect drains background services and OpenTelemetry calls TracerProvider.shutdown()
.
- Prefect’s telemetry processor calls self._export_thread.join()
.
- That export thread is trying to send/finish a telemetry HTTP request and is blocked in httpx
→ ssl._sslobj.read()
.
- The join blocks; Python reports it as “waiting for the tstate lock”.
Quick ways to confirm and mitigate
1) Disable OpenTelemetry (quickest way to prove it’s the culprit)
export OTEL_SDK_DISABLED=true
If the hang disappears, it’s the telemetry export thread. OTel env docs: OTel SDK env vars
2) Bound the shutdown wait for OTel (if you want to keep telemetry)
- Limit batch span processor shutdown time:
export OTEL_BSP_EXPORT_TIMEOUT=1s
- If you’re using OTLP exporters, also bound exporter timeouts:
export OTEL_EXPORTER_OTLP_TIMEOUT=1s
These ensure shutdown doesn’t wait long when the network is slow/unreachable.
3) Disable shipping logs to the API (rule out the log shipper also blocking on httpx)
export PREFECT_LOGGING_TO_API_ENABLED=false
4) Check environment/network
- If this run has no egress or sits behind a corporate proxy, make sure `HTTP(S)_PROXY`/`NO_PROXY` are set appropriately, or keep telemetry/log shipping disabled in that environment.
5) Upgrade Prefect to latest 3.x
- We’ve shipped improvements around cancellation/shutdown. Release notes: Prefect 3 release notes
If you can, please share:
- Prefect version and Python version
- Whether setting OTEL_SDK_DISABLED=true
fixes the hang
- Whether you’re running with internet access or behind a proxy
If disabling OTel/log shipping fixes it, we can dig into making the exporter non-blocking or honoring short timeouts in your environment. And if you’d like, I can open an issue on our side with your stack trace to harden the shutdown path so a blocked telemetry export can’t stall interpreter exit.