Based on the additional information you've provided and the research, I can offer a more specific diagnosis:
1. The 404 error suggests that when the automation tries to suspend/cancel the flow run after the 2-minute period, it cannot find the necessary endpoint or resource. This could be happening because:
- The flow run's infrastructure might be terminating or becoming unavailable before the automation can execute
- There might be a race condition between the state check and the action execution
- The flow run might be transitioning to a different state during the automation attempt
2. To troubleshoot this, I recommend:
a. First, add a small buffer delay to your automation to ensure stability:
- Instead of just checking for "in state for 2 minutes"
- Add a "delay" of 5-10 seconds before executing the action
- This gives the system time to stabilize and ensure all resources are available
b. Check your server logs for any additional error messages around the time of the automation trigger, particularly looking for:
- Infrastructure-related messages
- State transition messages
- Any API endpoint issues
c. Verify that your flow runs are using appropriate infrastructure that supports cancellation/suspension
3. Could you try to:
- Modify the automation to include a small delay (5-10 seconds) before the action
- Check if the issue persists with a longer "in state" duration (e.g., 3 minutes instead of 2)
- Verify if this happens with all flow runs or only specific ones
Could you also confirm:
1. Are you seeing this behavior consistently with all flow runs?
2. What infrastructure are you using for your flow runs (e.g., ProcessInfrastructure, DockerContainer, etc.)?
3. Can you manually suspend/cancel these flow runs through the UI when this happens?
This information would help narrow down whether this is an infrastructure-specific issue or a more general automation timing problem.