Hello everyone, I am using a dask cluster and I've...
# prefect-community
g
Hello everyone, I am using a dask cluster and I've run into this issue:
Copy code
distributed.protocol.pickle - INFO - Failed to serialize <Success: "Task run succeeded.">. Exception: cannot pickle 'lxml.etree.XMLSchema' object
which leads to
Copy code
distributed.worker - ERROR - failed during get data with <ip> -> <ip>
which at some point close the connexion
Copy code
distributed.comm.core.CommClosedError: in <TCP (closed)  local=tcp://<ip> remote=tcp://<ip>>: Stream is closed
this is managed by prefect by some retries (depending on configuration) and finally the agent raises an error and the flow is marked as failling
Copy code
distributed.scheduler.KilledWorker: ('concatenate_df_based_on_time-b91c06dc30f54c5084e9f5fe8b6b32a5', <WorkerState 'tcp://<ip>', status: closed, memory: 0, processing: 1>)
Do you have an idea on how to prevent this kind of error?
a
do you have your
XMLSchema
object defined outside of the task decorator or do you happen to return it from a task and pass it as data dependency to another task? If so, Dask can't serialize it with cloudpickle
g
From the error message I've understood that it was the prefect success object that is generating the Exception. is it possible?
a
can you by any chance share the flow code? it would be easier to assess that way - you can share via DM for privacy and we can still continue the discussion here
k
I doubt it’s the Prefect success object. The SUCCESS is just an exception and it’s serializable. There is something being returned that
cloudpickle
can’t serialize. I think you can try doing
cloudpickle.dumps(x)
to test this
👀 1
g
Ok you were right, a task was returning an XMLSchema object, which is indeed not serializable via pickle. I merged the tasks using this object so the XMLSchema is now internally used by the tasks and everything is working. ty for your time and guidance :)
a
nice work and thanks for the update!