Hi Team, We came across this error twice in a mon...
# ask-community
s
Hi Team, We came across this error twice in a month, and it makes us worry about the reliability of Prefect. Since Prefect is the backbone of our infrastructure, everything stops working when this happens. The retries and everything simply fails and all our flows fail until this issue persists. Our deployment: Prefect Cloud (1.0) + GCP Kubernetes + LocalDaskExecutor What can be done to safeguard against this? Is there a reliable way to retry and not fail? Task Run for Reference. CC: @Christina Lopez @Yash Joshi
Copy code
[20 Oct 2022 12:22pm]: Error during execution of task: SSLError(MaxRetryError("HTTPSConnectionPool(host='<http://api.prefect.io|api.prefect.io>', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)')))"))
m
Hey @Saurabh Indoria This error can come up for a variety of reasons, in this case the task run appears to have failed due to a heartbeat error and then the prefect api likely reached the maximum attempts to restart the task run resulting in this error
Error during execution of task: SSLError(MaxRetryError("HTTPSConnectionPool(host='<http://api.prefect.io|api.prefect.io>', port=443)
, this discourse article describes part of the issue here and offers some suggestions to alleviate the issue https://discourse.prefect.io/t/flow-is-failing-with-an-error-message-no-heartbeat-detected-from-the-remote-task/79
s
Thanks @Mason Menges πŸ™Œ
πŸ’™ 1
@Mason Menges I saw this error again on this flow run and this time the heartbeat issue came after a long time following this error. I assume due the Prefect API not reachable, obviously the heartbeats won't be reported. This is definitely not an issue about the pod being dropped because there were multiple tasks which reported this issue before the heartbeat issue coming up. Please let me know if my understanding is incorrect here.. CC: @Christina Lopez
πŸ‘€ 1
c
Flagging for @Kristen Denk as well since she’s helping y’all with your evaluation.
πŸ‘€ 1
πŸ™Œ 1
k
@Taylor Curran
@George Coyne
g
This looks like a TLS error
You can check that with something like
Copy code
from urllib.request import urlopen
urlopen('<https://www.howsmyssl.com/a/check').read(>)
Your python should be running TLS 1.2
We also see this with proxy config
Because on our side there are no API stability or SSL issues
k
@Saurabh Indoria @Yash Joshi
πŸ™ 2
s
Thanks for that insight, but I wonder why this would occur once in a thousand runs. If our TLS config is bad, the flows should never run, right? @George Coyne
a
correct, perhaps some transient issue that you could solve with flow-level retries? that's what retries are for πŸ™Œ