Raymond Yu

05/23/2022, 5:14 AM
Hey Prefect, we’re encountering a somewhat stochastic error when running a
for a long running
DatabricksSubmitRun in another flow even when the Databricks job runs to completion without an issue. We noticed this can occasionally result in the error enclosed below that causes no heartbeat to be detected. Has anyone encountered this? Any ideas on what may be causing this and how to address the issue?
Error during execution of task: ClientError([{'path': ['flow_run'], 'message': 'request to <http://hasura:3000/v1alpha1/graphql> failed, reason: read ECONNRESET', 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'request to <http://hasura:3000/v1alpha1/graphql> failed, reason: read ECONNRESET', 'type': 'system', 'errno': 'ECONNRESET', 'code': 'ECONNRESET'}}}])
In a possibly related note, we’re also encountering a similar issue polling on EMR termination status via boto3 i.e
Where the task also loses heartbeat despite the EMR cluster running successfully and terminating as expected.

Anna Geller

05/23/2022, 9:13 AM