Guillaume Latour
05/19/2022, 8:20 AMdistributed.protocol.pickle - INFO - Failed to serialize <Success: "Task run succeeded.">. Exception: cannot pickle 'lxml.etree.XMLSchema' object
which leads to
distributed.worker - ERROR - failed during get data with <ip> -> <ip>
which at some point close the connexion
distributed.comm.core.CommClosedError: in <TCP (closed) local=tcp://<ip> remote=tcp://<ip>>: Stream is closed
this is managed by prefect by some retries (depending on configuration) and finally the agent raises an error and the flow is marked as failling
distributed.scheduler.KilledWorker: ('concatenate_df_based_on_time-b91c06dc30f54c5084e9f5fe8b6b32a5', <WorkerState 'tcp://<ip>', status: closed, memory: 0, processing: 1>)
Do you have an idea on how to prevent this kind of error?Anna Geller
05/19/2022, 9:52 AMXMLSchema
object defined outside of the task decorator or do you happen to return it from a task and pass it as data dependency to another task? If so, Dask can't serialize it with cloudpickleGuillaume Latour
05/19/2022, 11:40 AMAnna Geller
05/19/2022, 11:42 AMKevin Kho
05/19/2022, 1:33 PMcloudpickle
can’t serialize. I think you can try doing cloudpickle.dumps(x)
to test thisGuillaume Latour
05/24/2022, 6:48 AMAnna Geller
05/24/2022, 12:39 PM