<@ULVA73B9P> After upgrading to Prefect 3 I notice...
# ask-marvin
a
@Marvin After upgrading to Prefect 3 I notice that some runs immediately end up crashed and automatically get restarted. This only happened after upgrading. What is going on?
m
thought for 39.9 seconds
Based on the research, I can explain what's happening with those automatic restarts in Prefect 3.x. This is actually an intentional feature change related to how Prefect handles "zombie" flow runs. Here's what's going on: 1. Prefect 3.x introduced a new heartbeat mechanism that actively monitors flow runs to detect when they become unresponsive or "zombie" flows. 2. When a flow run stops sending heartbeat events (which can happen due to various infrastructure issues, crashes, or network problems), Prefect can automatically: - Mark the run as "Crashed" - Potentially trigger an automatic restart This is different from Prefect 2.x, which had less automated crash detection and recovery mechanisms. To address this, you have a few options: 1. Adjust the heartbeat frequency if needed:
Copy code
python
   from prefect.settings import Setting
   
   PREFECT_RUNNER_HEARTBEAT_FREQUENCY = Setting(
       name="PREFECT_RUNNER_HEARTBEAT_FREQUENCY",
       value=30,  # seconds
   )
2. Investigate the root cause of the crashes: - Check your flow logs for any errors just before the crashes - Monitor system resources (especially important since you're using a db.t4g.small instance) - Look for any infrastructure-related issues that might be causing the flow to become unresponsive 3. Disable automatic restarts if they're not desired (though this isn't generally recommended as it's a helpful reliability feature) If you're seeing this behavior frequently, I'd recommend: 1. First checking your flow logs to understand why the flows are becoming unresponsive 2. Ensuring your db.t4g.small instance has sufficient resources for your workload 3. Verifying that your network connectivity is stable between your flow runners and the Prefect server Would you like me to help you investigate any specific aspects of these crashes or explain more about configuring the heartbeat mechanism?
n
this... is very likely not correct
are your workers still 2.x?
a
Yes, workers are still 2.x. Probably needs some tuning! 🙂
n
yup you'll need to upgrade those workers to 3.x