Hey Prefect, we’re encountering a somewhat stochas...
# prefect-community
r
Hey Prefect, we’re encountering a somewhat stochastic error when running a
wait_for_flow_run
for a long running
DatabricksSubmitRun in another flow even when the Databricks job runs to completion without an issue. We noticed this can occasionally result in the error enclosed below that causes no heartbeat to be detected. Has anyone encountered this? Any ideas on what may be causing this and how to address the issue?
Copy code
Error during execution of task: ClientError([{'path': ['flow_run'], 'message': 'request to <http://hasura:3000/v1alpha1/graphql> failed, reason: read ECONNRESET', 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'request to <http://hasura:3000/v1alpha1/graphql> failed, reason: read ECONNRESET', 'type': 'system', 'errno': 'ECONNRESET', 'code': 'ECONNRESET'}}}])
In a possibly related note, we’re also encountering a similar issue polling on EMR termination status via boto3 i.e
emr_client.get_waiter("cluster_terminated").wait(ClusterId=cluster_id)
Where the task also loses heartbeat despite the EMR cluster running successfully and terminating as expected.
a