https://prefect.io logo
Title
p

Parwez Noori

01/15/2023, 3:52 PM
In case that a prefect flow crashes with: "Execution was interrupted by an unexpected exception.", what can be done? I have added retries to the flow but it is not picked up.
a

Anna Geller

01/15/2023, 6:20 PM
there are many reasons why this could happen, you would need to take it more step by step and find out what's causing it what infra did you use? perhaps try with just the local process and see if that works first? a more detailed walkthrough of what you did, your Prefect version, a minimal example that gives that error written in a GitHub issue + linked here would be helpful
what happens when you run this without Prefect, just Python function - is this exception raised already there? you can enable DEBUG logs on your agent to get more info too
p

Parwez Noori

01/15/2023, 7:47 PM
Yes, our infrastructure looks like this: The flows runs as kubernetes jobs. Furthermore, the flow runs fine as a local process. We are working with prefect version 2.2. I forgot to mention that, the flow normally works fine. However, once every two weeks it crashes. Digging a bit deeper I got the following explaination:
response = await self._send_handling_auth(
  File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1642, in _send_handling_auth
    response = await self._send_handling_redirects(
  File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1679, in _send_handling_redirects
    response = await self._send_single_request(request)
  File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1716, in _send_single_request
    response = await transport.handle_async_request(request)
  File "/usr/local/lib/python3.9/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/usr/local/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.9/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ConnectError: [Errno -3] Temporary failure in name resolution
Worker information:
    Approximate queue length: 0
    Pending log batch length: 3
    Pending log batch size: 1080
The log worker is stopping and these logs will not be sent.
a

Anna Geller

01/16/2023, 12:42 PM
it could be due to a lost internet connection or a similar networking issue you could try adding e.g. flow-level retries, there are also Automations in Cloud that help you detect and react to such issues e.g. if the run ends in a Crashed state, cancel that run and create a new one
p

Parwez Noori

01/16/2023, 7:27 PM
Thank you for the answer Anna. I will do that.
🙌 1