Saurabh Indoria

    Saurabh Indoria

    9 months ago
    One of the flow runs encountered this error. Usually the runs succeed and this was an anomaly. Can someone share some details about this error? How can we prevent this from happening and why didn't it retry?
    Pod prefect-job-94453cb1-2sw9q failed.
    	Container 'flow' state: terminated
    		Exit Code:: 139
    		Reason: Error
    Anna Geller

    Anna Geller

    9 months ago
    @Saurabh Indoria looks like your pod crashed for some reason and flow’s heartbeat got lost. It could be that your pod ran out of memory. To prevent this from happening and mitigate the issue you can:1. Track and allocate enough CPU and memory resources to your pods 2. You can establish some extra logging on your Kubernetes cluster to be able to track why this pod crashed. 3. You can set the flow’s heartbeat mode to thread if you have some long-running flows:
    from prefect.run_configs import KubernetesRun
    flow.run_config = KubernetesRun(env={"PREFECT__CLOUD__HEARTBEAT_MODE": "thread"})
    Saurabh Indoria

    Saurabh Indoria

    9 months ago
    Thank you for the quick response @Anna Geller 🙌