Hi all - we’re seeing sporadic `BrokenPipeError: [...
# prefect-community
i
Hi all - we’re seeing sporadic
BrokenPipeError: [Errno 32]
Broken pipe crashes in one of our flows on 2.7.7, running on DaskTaskRunner. This flow runs ~1000 tasks, occasionally one of them will enter a Crashed state with this error and cause our flow to enter a Failed state. Retries on these crashed tasks don’t seem to work (I’m guessing Crashed state tasks are excluded from retry logic). Full traceback in the thread. Any ideas? Thank you!
Copy code
Crash detected! Execution was interrupted by an unexpected exception: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 924, in write
    n = self._sock.send(data)
BrokenPipeError: [Errno 32] Broken pipe

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
    yield
  File "/usr/local/lib/python3.10/site-packages/httpcore/backends/asyncio.py", line 51, in write
    await self._stream.send(item=buffer)
  File "/usr/local/lib/python3.10/site-packages/anyio/streams/tls.py", line 202, in send
    await self._call_sslobject_method(self._ssl_object.write, item)
  File "/usr/local/lib/python3.10/site-packages/anyio/streams/tls.py", line 168, in _call_sslobject_method
    await self.transport_stream.send(self._write_bio.read())
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 1297, in send
    raise self._protocol.exception
anyio.BrokenResourceError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
    yield
  File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 253, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 237, in handle_async_request
    response = await connection.handle_async_request(request)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/connection.py", line 90, in handle_async_request
    return await self._connection.handle_async_request(request)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http2.py", line 144, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http2.py", line 106, in handle_async_request
    await self._send_request_headers(request=request, stream_id=stream_id)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http2.py", line 205, in _send_request_headers
    await self._write_outgoing_data(request)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http2.py", line 370, in _write_outgoing_data
    raise exc
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http2.py", line 358, in _write_outgoing_data
    await self._network_stream.write(data_to_send, timeout)
  File "/usr/local/lib/python3.10/site-packages/httpcore/backends/asyncio.py", line 49, in write
    with map_exceptions(exc_map):
  File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc)
httpcore.WriteError

The above exception was the direct cause of the following exception:

httpx.WriteError
z
Crashed states are indeed excluded from retries since the failure happens outside of your code.
I’m not sure of the best way to resolve these issues, perhaps we should retry on these. We retry on similar
ReadError
exceptions.
i
Thank you for the clarification @Zanie - this makes total sense. We will keep track of this PR and in the meantime we can add custom retries to tasks that come back with crashed states. Do you know what might be the cause of the error/where we might want to start looking?
z
I’m not sure, it’s a networking issue. I’ve reached out to the httpx team.
i
Makes sense, thanks again for your help on this Michael!