We are seeing quite a lot intermittent of API erro...
# prefect-community
f
We are seeing quite a lot intermittent of API errors and consequent flow run failures in cloud v1. Anyone else?
a
thanks for reporting Florian, I forwarded to the team
f
gratitude thank you 1
a
are you sure this is the right ID? we don't see any errors with that ID
we cross-checked with the team and nothing stands out in particular, unfortunately. if the situation continues, feel free to add more details (logs, more flow run IDs) - atm things look stable. Sorry for not being more helpful here
🙏 1
👍 2
f
Here are some another IDs bc3f2411-bca8-47af-8125-a90d12ed946b, bc3f2411-bca8-47af-8125-a90d12ed946b and the first one again 432371db-b3eb-400e-9647-e7352c25663c. We got about 5 more over night. They all contain the same error:
Copy code
Failed to set task state with error: ClientError([{'path': ['set_task_run_states'], 'message': 'Operation timed out', 'extensions': {'code': 'API_ERROR'}}])
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/cloud/task_runner.py", line 91, in call_runner_target_handlers
    state = self.client.set_task_run_state(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 1922, in set_task_run_state
    result = self.graphql(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 570, in graphql
    raise ClientError(result["errors"])
prefect.exceptions.ClientError: [{'path': ['set_task_run_states'], 'message': 'Operation timed out', 'extensions': {'code': 'API_ERROR'}}]
@Anna Geller to me it seams some of them got rescheduled but only the tasks that did not run prior to the API error. Which does not work if no intermediate results are saved.
a
we investigated and we see errors related to version lock:
Copy code
VERSION LOCK: State update failed for flow run ID bc3f2411-bca8-47af-8125-a90d12ed946b: provided version 1 but current version is 2.
have you tried enabling Version Locking for this flow? https://docs-v1.prefect.io/orchestration/concepts/flows.html#toggle-version-locking
f
No we did not try. We also did not see any new errors of this kind yesterday. Is there a way to enable version locking globally?
a
You could write a script to list flows and enable it per each flow programmatically via GraphQL API but no single endpoint to do that globally
We have some similar GraphQL query examples on Discourse IIRC
f
The reason I am asking is, because this is the second time that I get this Feature recommended, for a totally different problem, and never are any downsides mentioned. So I wonder why this is not turned on globally.
a
version locking enforces that your flow runs only once and that's not always desirable - we have users who want to run flows any time for any reason, so having that as a global default is, as with anything in engineering, a tradeoff
I got more concrete reasons from our experts 1. Version locking is not free, there is a cost imposed with respect to performance/latency 2. Version locking can have unexpected coordination effects like flow runs crashing / not recovering properly due to version lock errors and those errors can sometimes be hard to understand/troubleshoot if enabled for everything, so it's better to enabled it per flow when needed
🙌 1
f
If it limits the flow to run only once per version it is completely useless in both cases it was suggested to us. Thanks for clearing that up.