James Zhang
02/21/2024, 8:41 AMCrash detected! Execution was interrupted by an unexpected exception: RuntimeError: Cannot orchestrate task run 'bd06afb5-0e74-48e8-8ba9-a6d0d6f91430'. Failed to reach API at <http://prefect-server/api/>.
and the tasks are crashed… But I don’t see the Prefect Server is down or offline, my Prefect setup is in k8s… maybe it’s because of too many concurrencies? there are hundreds of them… but I’ve already set the concurrency limit…
update: after going through stack traces, I see exceptions like
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/prefect/engine.py", line 1656, in create_task_run_then_submit
await create_task_run(
File "/opt/conda/lib/python3.10/site-packages/prefect/engine.py", line 1704, in create_task_run
task_run = await flow_run_context.client.create_task_run(
File "/opt/conda/lib/python3.10/site-packages/prefect/client/orchestration.py", line 2018, in create_task_run
response = await self._client.post(
File "/opt/conda/lib/python3.10/site-packages/httpx/_client.py", line 1877, in post
return await self.request(
File "/opt/conda/lib/python3.10/site-packages/httpx/_client.py", line 1559, in request
return await self.send(request, auth=auth, follow_redirects=follow_redirects)
File "/opt/conda/lib/python3.10/site-packages/prefect/client/base.py", line 282, in send
response = await self._send_with_retry(
File "/opt/conda/lib/python3.10/site-packages/prefect/client/base.py", line 216, in _send_with_retry
response = await request()
File "/opt/conda/lib/python3.10/site-packages/httpx/_client.py", line 1646, in send
response = await self._send_handling_auth(
File "/opt/conda/lib/python3.10/site-packages/httpx/_client.py", line 1674, in _send_handling_auth
response = await self._send_handling_redirects(
File "/opt/conda/lib/python3.10/site-packages/httpx/_client.py", line 1711, in _send_handling_redirects
response = await self._send_single_request(request)
File "/opt/conda/lib/python3.10/site-packages/httpx/_client.py", line 1748, in _send_single_request
response = await transport.handle_async_request(request)
File "/opt/conda/lib/python3.10/site-packages/httpx/_transports/default.py", line 370, in handle_async_request
with map_httpcore_exceptions():
File "/opt/conda/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/opt/conda/lib/python3.10/site-packages/httpx/_transports/default.py", line 84, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.PoolTimeout
is the PoolTimeout something I can change?
I’m not sure where to start to debug, any ideas? Thanks!Uriel Mandujano
02/21/2024, 5:04 PMPREFECT_API_REQUEST_TIMEOUT=120
. that should give your prefect server more time to response to your worker's requests.
to me this seems like a case where your prefect server is overloaded with requests coming into it and the concurrency
you mentioned wouldn't come into play there. you could try increasing the resources your prefect server has available to see if that lets it handle more API requests.James Zhang
02/26/2024, 11:37 AM