Hey Prefect we re encountering a somewhat stochastic error w Prefect Community #ask-community

Hey Prefect, we’re encountering a somewhat stochas...

Raymond Yu

05/23/2022, 5:14 AM

Hey Prefect, we’re encountering a somewhat stochastic error when running a wait_for_flow_run
for a long running DatabricksSubmitRun in another flow even when the Databricks job runs to completion without an issue. We noticed this can occasionally result in the error enclosed below that causes no heartbeat to be detected. Has anyone encountered this? Any ideas on what may be causing this and how to address the issue?

Copy code

Error during execution of task: ClientError([{'path': ['flow_run'], 'message': 'request to <http://hasura:3000/v1alpha1/graphql> failed, reason: read ECONNRESET', 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'request to <http://hasura:3000/v1alpha1/graphql> failed, reason: read ECONNRESET', 'type': 'system', 'errno': 'ECONNRESET', 'code': 'ECONNRESET'}}}])

Raymond Yu

05/23/2022, 5:17 AM

In a possibly related note, we’re also encountering a similar issue polling on EMR termination status via boto3 i.e

emr_client.get_waiter("cluster_terminated").wait(ClusterId=cluster_id)

Where the task also loses heartbeat despite the EMR cluster running successfully and terminating as expected.

Anna Geller

05/23/2022, 9:13 AM

You could disable heartbeat for the subflow, this topic dives deeper https://discourse.prefect.io/t/flow-is-failing-with-an-error-message-no-heartbeat-detected-from-the-remote-task/79

5 Views

Open in Slack

Previous Next