Prefect appears to be hanging on task shutdown (se...
# ask-community
o
Prefect appears to be hanging on task shutdown (see attached) waiting for the tstate (thread state?) lock to do some telemetry work. Task has been going for 30 min, has a timeout of 60s, and has not been retried per the prefect UI. Is there something I might have done to cause this? Any tips to troubleshoot further?
cc @Marvin Prefect appears to be hanging on task shutdown (see attached) waiting for the tstate (task state?) lock. Task has been going for 30 min, has a timeout of 60s, and has not been retried per the prefect UI. Is there something I might have done to cause this? Any tips to troubleshoot further?
m
thought for 155.2 seconds
o
Also, here's the stack trace when i SIGINT:
Copy code
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpx/_client.py", line 1014, in _send_single_request                                                                                [0/899]
    response = transport.handle_request(request)                                                                                                                                                                                                   
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                         
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpx/_transports/default.py", line 250, in handle_request                                                                                  
    resp = self._pool.handle_request(req)                                                                                                                                                                                                          
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 256, in handle_request
    raise exc from None                                                                                                                                                                                                                            
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 236, in handle_request
    response = connection.handle_request(                                                                                
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/connection.py", line 103, in handle_request                                                                                  
    return self._connection.handle_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 136, in handle_request                                                                                      
    raise exc  
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 106, in handle_request
    ) = self._receive_response_headers(**kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 177, in _receive_response_headers
    event = self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 217, in _receive_event
    data = self._network_stream.read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                              
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_backends/sync.py", line 128, in read                                                                                              
    return self._sock.recv(max_bytes)                                                                                                                                                                                                              
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/ssl.py", line 1296, in recv          
    return self.read(buflen)             
           ^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                       
  File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/ssl.py", line 1169, in read                                                                                                                        
    return self._sslobj.read(len)                                                                                                                                                                                                                  
           ^^^^^^^^^^^^^^^^^^^^^^    
KeyboardInterrupt                                                                                                                                                                                                                                  
^CException ignored in: <module 'threading' from '/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py'>                                                                                           
Traceback (most recent call last):
  File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 1553, in _shutdown                              
    atexit_call()                                   
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/services.py", line 73, in _at_exit                                                                            
    self.drain(at_exit=True)                                                                                                                                                                                                                       
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/services.py", line 208, in drain                                                                              
    return future.result()     
           ^^^^^^^^^^^^^^^                                                                                                                                                                                                                         
  File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/concurrent/futures/_base.py", line 451, in result                                                                                                  
    self._condition.wait(timeout)
  File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 320, in wait                                 
    waiter.acquire()                 
KeyboardInterrupt:                   
^CException ignored in atexit callback: <bound method TracerProvider.shutdown of <opentelemetry.sdk.trace.TracerProvider object at 0x118d05f10>> 
Traceback (most recent call last):   
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/opentelemetry/sdk/trace/__init__.py", line 1281, in shutdown                                                                                
    self._active_span_processor.shutdown()                                                                                                                                                                                                         
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/opentelemetry/sdk/trace/__init__.py", line 180, in shutdown                                                                                 
    sp.shutdown()                                      
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/prefect/telemetry/processors.py", line 56, in shutdown
    self._export_thread.join()          
  File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 1112, in join                                                                                                                  
    self._wait_for_tstate_lock()                                                                                                                                                                                                                   
  File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 1132, in _wait_for_tstate_lock                                                                                                 
    if lock.acquire(block, timeout):         
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                
KeyboardInterrupt:
m
Thanks for the details — a few quick instincts here. What you’re seeing (“waiting for the tstate lock”) is a CPython shutdown symptom that generally means there’s still a Python thread running when the interpreter tries to exit. In Prefect 3.x, task timeouts and cancellations are cooperative. If a task is doing blocking work in a thread (e.g., a long-running C extension or an I/O call that doesn’t check for cancellation), Prefect can mark the task as timed out but cannot forcibly stop the underlying thread. That lingering thread can keep the process from shutting down cleanly and lead to “tstate lock” messages. This also explains why you don’t see a retry yet: the run hasn’t fully transitioned because the worker is still waiting on that thread to exit. A few questions to pinpoint it: - Which versions are you on? Prefect version and Python version (3.12 makes these thread shutdown messages more visible). - What task runner is your flow using (default ThreadPoolTaskRunner or ProcessPoolTaskRunner)? - What does the task do during those 30 minutes — any long network calls, database queries, blocking C-extensions (pandas, numpy, boto3, psycopg2),
time.sleep
, or subprocesses? - Do you have a snippet of the exact “tstate lock” log line and where it’s printed (worker logs vs. container stdout)? Concrete things to try: 1) Turn on debug logging to see cancellation/timeout behavior:
Copy code
export PREFECT_LOGGING_LEVEL=DEBUG
Then rerun and look for logs like “TaskRunTimeoutError”, “cancellation requested”, etc. 2) Make the task cancellation-friendly: - Add timeouts to any external calls (e.g.,
requests.get(..., timeout=30)
, database client timeouts, boto3 config timeouts). - If you control the loop, occasionally perform non-blocking waits (async: use
await anyio.sleep(0)
) so cancellation can be noticed. Avoid
time.sleep
in async tasks; prefer
anyio.sleep
. - If you spin up your own threads in task code, ensure you shut them down/join them on cancellation, or make them daemon threads. 3) Consider isolating blocking or CPU-bound work: - Switch the flow to a ProcessPoolTaskRunner so a stuck task won’t hold the main interpreter’s threads:
Copy code
from prefect import flow, task
     from prefect.task_runners import ProcessPoolTaskRunner

     @task(timeout_seconds=60, retries=1, retry_delay_seconds=10)
     def do_work():
         ...

     @flow(task_runner=ProcessPoolTaskRunner(max_workers=2))
     def my_flow():
         do_work.submit()
Processes can be terminated by the OS if needed, which avoids the “tstate lock” on interpreter shutdown that you see with threads. - Alternatively, run the heavy bit in a subprocess you control and set a strict timeout so you can kill it if needed:
Copy code
import subprocess

     @task(timeout_seconds=70)
     def do_work():
         try:
             subprocess.run(["python", "-c", "import time; time.sleep(999)"], timeout=60, check=False)
         except subprocess.TimeoutExpired:
             # clean up, then raise so Prefect records a failure/timeout
             raise
4) Capture a thread dump when things hang to see what’s blocking: - Enable Python faulthandler and dump all thread stacks on demand:
Copy code
import faulthandler, sys, signal
     faulthandler.enable()
     faulthandler.register(signal.SIGUSR2, all_threads=True, chain=True)
Then send SIGUSR2 to the process to get a stack dump in logs. That will tell us exactly which library call is stuck. 5) Upgrade to the latest Prefect 3.x if you’re not already — there have been improvements in cancellation and logging: - Release notes: Prefect 3 release notes - Background on why cancellations are cooperative (AnyIO): How AnyIO powers Prefect’s async architecture If you can share:
- Prefect + Python versions - The task code (or a minimal repro) - Your flow’s task runner - A snippet of the exact shutdown/tstate log …I can help zero in on whether this is a non-cooperative call, a thread that’s not being joined, or something else.
o
@Marvin Look at this stack trace, prefect appears to be doing some telemetry work that needs that lock:
Copy code
File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpx/_client.py", line 1014, in _send_single_request                                                                                [0/899]
    response = transport.handle_request(request)                                                                                                                                                                                                   
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                         
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpx/_transports/default.py", line 250, in handle_request                                                                                  
    resp = self._pool.handle_request(req)                                                                                                                                                                                                          
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 256, in handle_request
    raise exc from None                                                                                                                                                                                                                            
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 236, in handle_request
    response = connection.handle_request(                                                                                
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/connection.py", line 103, in handle_request                                                                                  
    return self._connection.handle_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 136, in handle_request                                                                                      
    raise exc  
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 106, in handle_request
    ) = self._receive_response_headers(**kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 177, in _receive_response_headers
    event = self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 217, in _receive_event
    data = self._network_stream.read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                              
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/httpcore/_backends/sync.py", line 128, in read                                                                                              
    return self._sock.recv(max_bytes)                                                                                                                                                                                                              
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/ssl.py", line 1296, in recv          
    return self.read(buflen)             
           ^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                       
  File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/ssl.py", line 1169, in read                                                                                                                        
    return self._sslobj.read(len)                                                                                                                                                                                                                  
           ^^^^^^^^^^^^^^^^^^^^^^    
KeyboardInterrupt                                                                                                                                                                                                                                  
^CException ignored in: <module 'threading' from '/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py'>                                                                                           
Traceback (most recent call last):
  File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 1553, in _shutdown                              
    atexit_call()                                   
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/services.py", line 73, in _at_exit                                                                            
    self.drain(at_exit=True)                                                                                                                                                                                                                       
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/services.py", line 208, in drain                                                                              
    return future.result()     
           ^^^^^^^^^^^^^^^                                                                                                                                                                                                                         
  File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/concurrent/futures/_base.py", line 451, in result                                                                                                  
    self._condition.wait(timeout)
  File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 320, in wait                                 
    waiter.acquire()                 
KeyboardInterrupt:                   
^CException ignored in atexit callback: <bound method TracerProvider.shutdown of <opentelemetry.sdk.trace.TracerProvider object at 0x118d05f10>> 
Traceback (most recent call last):   
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/opentelemetry/sdk/trace/__init__.py", line 1281, in shutdown                                                                                
    self._active_span_processor.shutdown()                                                                                                                                                                                                         
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/opentelemetry/sdk/trace/__init__.py", line 180, in shutdown                                                                                 
    sp.shutdown()                                      
  File "/Users/owenboyd/projects/bild/estimator/.worktrees/all-tags/.venv/lib/python3.11/site-packages/prefect/telemetry/processors.py", line 56, in shutdown
    self._export_thread.join()          
  File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 1112, in join                                                                                                                  
    self._wait_for_tstate_lock()                                                                                                                                                                                                                   
  File "/Users/owenboyd/.local/share/uv/python/cpython-3.11.5-macos-aarch64-none/lib/python3.11/threading.py", line 1132, in _wait_for_tstate_lock                                                                                                 
    if lock.acquire(block, timeout):         
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                
KeyboardInterrupt:
m
thought for 306.4 seconds
Thanks — that stack trace is super helpful. You’re right: this isn’t your task “holding a lock”, it’s Prefect’s telemetry shutdown path waiting on a background export thread that’s blocked in an httpx SSL read. Then, during interpreter shutdown, Python waits for that thread (hence
_wait_for_tstate_lock
), and Ctrl-C turns it into the noisy “tstate lock” spew. What’s happening - On shutdown (or Ctrl‑C), Prefect drains background services and OpenTelemetry calls
TracerProvider.shutdown()
. - Prefect’s telemetry processor calls
self._export_thread.join()
. - That export thread is trying to send/finish a telemetry HTTP request and is blocked in
httpx
ssl._sslobj.read()
. - The join blocks; Python reports it as “waiting for the tstate lock”. Quick ways to confirm and mitigate 1) Disable OpenTelemetry (quickest way to prove it’s the culprit)
Copy code
export OTEL_SDK_DISABLED=true
If the hang disappears, it’s the telemetry export thread. OTel env docs: OTel SDK env vars 2) Bound the shutdown wait for OTel (if you want to keep telemetry) - Limit batch span processor shutdown time:
Copy code
export OTEL_BSP_EXPORT_TIMEOUT=1s
- If you’re using OTLP exporters, also bound exporter timeouts:
Copy code
export OTEL_EXPORTER_OTLP_TIMEOUT=1s
These ensure shutdown doesn’t wait long when the network is slow/unreachable. 3) Disable shipping logs to the API (rule out the log shipper also blocking on httpx)
Copy code
export PREFECT_LOGGING_TO_API_ENABLED=false
4) Check environment/network - If this run has no egress or sits behind a corporate proxy, make sure `HTTP(S)_PROXY`/`NO_PROXY` are set appropriately, or keep telemetry/log shipping disabled in that environment. 5) Upgrade Prefect to latest 3.x - We’ve shipped improvements around cancellation/shutdown. Release notes: Prefect 3 release notes If you can, please share: - Prefect version and Python version - Whether setting
OTEL_SDK_DISABLED=true
fixes the hang - Whether you’re running with internet access or behind a proxy If disabling OTel/log shipping fixes it, we can dig into making the exporter non-blocking or honoring short timeouts in your environment. And if you’d like, I can open an issue on our side with your stack trace to harden the shutdown path so a blocked telemetry export can’t stall interpreter exit.