Hi folks, we have run into this error message: ```...
# prefect-community
m
Hi folks, we have run into this error message:
Copy code
{'_schema': 'Invalid data type: None'}
twice over the last week over our many flow runs. It seems other folks have encountered this due to a version mismatch between the agent and their execution environment. However that is not the case for us - additionally the same flow run will proceed to run successfully for future runs without any changes from our end. See more details in the thread
We are running prefect version 0.14.22 on both our kubernetes agent, and dask executor on EKS. Here is the full exception traceback from our logs
Copy code
Failed to retrieve task state with error: ValidationError({'_schema': 'Invalid data type: None'})
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/cloud/task_runner.py", line 154, in initialize_run
    task_run_info = self.client.get_task_run_info(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 1406, in get_task_run_info
    state = prefect.engine.state.State.deserialize(task_run_info.serialized_state)
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/state.py", line 390, in deserialize
    state = StateSchema().load(json_blob)
  File "/usr/local/lib/python3.8/site-packages/marshmallow_oneofschema/one_of_schema.py", line 153, in load
    raise exc
marshmallow.exceptions.ValidationError: {'_schema': 'Invalid data type: None'}
This caused the task run to hang in a pending state, and subsequently caused the flow run to hang in a running state for 11 hours
the same flow ran successfully prior to the failure, and will most likely run successfully after this failure (I will confirm in a bit) - making this a a rare occurrence (happened only 2 times over the past week and we have not seen this before last week)
k
Looking at this and don’t have any immediate ideas. Will dig around
m
(thank you)
z
This is due to null states in the backend, sometimes they take time to propagate and if you deserialize a task run payload with a null state it will fail.
Are you on Server or Cloud?
m
On cloud
z
Thanks! I’ll investigate a fix.
👍 1
https://github.com/PrefectHQ/prefect/pull/5718 is our general approach here, but I’m investigating a server-side fix as well.
m
I see - thanks for sharing the suggested fix
hmm in the other flow run where we encountered this issue - the failure happened inside
get_flow_run_info
Copy code
File "/usr/local/lib/python3.8/site-packages/prefect/tasks/prefect/flow_run.py", line 209, in run
    flow_run_state = client.get_flow_run_info(flow_run_id).state
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 1168, in get_flow_run_info
    state=State.deserialize(result.serialized_state),
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/state.py", line 390, in deserialize
    state = StateSchema().load(json_blob)
  File "/usr/local/lib/python3.8/site-packages/marshmallow_oneofschema/one_of_schema.py", line 153, in load
    raise exc
marshmallow.exceptions.ValidationError: {'_schema': 'Invalid data type: None'}
so an equivalent patch to
State.deserialize
in the
FlowRunInfoResult.__init__
call would be needed if we want to rely on a client-side fix
z
Yeah that seems prudent
m
@Zanie sorry to disturb - I hope you are having a good Friday. Are there any updates as to whether a fix was put into place on the server side ? We haven’t experienced any failures since I last reported this 3 days ago on our end
z
No server-side fix, we are probably just going to address it client-side for now since we do not get many reports.
A fix was merged into
master
a couple days ago for flow/task run info methods.
m
I see - so we would have to move from 0.14.22 to 1.something ?
z
Our cherry-pick that change yeah
m
ok