Pascal Jahan Elahi (Pawsey)
12/06/2024, 8:07 AM16:01:23.576 | ERROR | distributed.protocol.pickle - Failed to serialize <ToPickle: HighLevelGraph with 1 layers.
<dask.highlevelgraph.HighLevelGraph object at 0xfffee1e977d0>
0. 281470175614400
>.
Traceback (most recent call last):
File "/software/projects/pawsey0001/pelahi/py-venv-vqpu/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 60, in dumps
result = pickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.PicklingError: Can't pickle <function task1 at 0xffff0d8616c0>: it's not the same object as __main__.task1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/software/projects/pawsey0001/pelahi/py-venv-vqpu/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 65, in dumps
pickler.dump(x)
_pickle.PicklingError: Can't pickle <function task1 at 0xffff0d8616c0>: it's not the same object as __main__.task1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/software/projects/pawsey0001/pelahi/py-venv-vqpu/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 77, in dumps
result = cloudpickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/software/projects/pawsey0001/pelahi/py-venv-vqpu/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1529, in dumps
cp.dump(obj)
File "/software/projects/pawsey0001/pelahi/py-venv-vqpu/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1295, in dump
return super().dump(obj)
^^^^^^^^^^^^^^^^^
TypeError: cannot pickle 'async_generator' object
16:01:23.580 | ERROR | Flow run 'cautious-heron' - Crash detected! Execution was interrupted by an unexpected exception: TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 1 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0xfffee1e977d0>\n 0. 281470175614400\n>')
What should I make of this error?Marvin
12/06/2024, 8:07 AM, I am seeing the following error when I use a flow with a SLURMCluster DaskTaskRunner,```160123.576 | ERROR | distributed.protocol.pickle - Failed to serialize <ToPickle: HighLevelGraph with 1 layers. <dask.highlevelgraph.HighLevelGraph object at 0xfffee1e977d0> 0. 281470175614400
.Traceback (most recent call last): File "/software/projects/pawsey0001/pelahi/py-venv-vqpu/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 60, in dumps result = pickle.dumps(x, **dump_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ _pickle.PicklingError: Can't pickle <function task1 at 0xffff0d8616c0>: it's not the same object as main.task1 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/software/projects/pawsey0001/pelahi/py-venv-vqpu/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 65, in dumps pickler.dump(x) _pickle.PicklingError: Can't pickle <function task1 at 0xffff0d8616c0>:
Pascal Jahan Elahi (Pawsey)
12/06/2024, 8:14 AM16:01:23.576 | ERROR | distributed.protocol.pickle - Failed to serialize <ToPickle: HighLevelGraph with 1 layers.
<dask.highlevelgraph.HighLevelGraph object at 0xfffee1e977d0>
0. 281470175614400
Yet the task is a simple one
@task()
def task1(event: asyncio.Event):
logger = get_run_logger()
logger.info("WOOT start")
logger.info("WOOT finished")
and the flow is as well
@flow
async def subflow1(event: asyncio.Event, before: bool = True):
logger = get_run_logger()
logger.info("Running subflow 1")
future = task1.submit(event).result()
event.set()
logger.info("Subflow 1 completed")
and the flow is called
asyncio.create_task(subflow1.with_options(task_runner = task_runners['generic'])(event))
Marvin
12/06/2024, 8:18 AMpickle
serialization error when using a SLURMCluster
with DaskTaskRunner
. The error message suggests that the task's HighLevelGraph
could not be serialized, which often happens when trying to pass arguments that are not easily serializable by Python's default pickle
library.
From the Prefect documentation, we know that using Dask often involves considerations about how objects are serialized. Prefect provides custom serializers, but not all objects can be pickled easily, particularly when dealing with complex structures or async objects like asyncio.Event
that you are using in your task.
Here are a few strategies you might consider:
1. Check Object Compatibility: Make sure that all the arguments passed to your tasks and flows are serializable with pickle
. asyncio.Event
might not serialize well. Consider using simple data types or custom objects that implement pickle support.
2. Custom Serializer: Prefect allows using custom pickle serializers which might better handle certain object types. You can explore serializers from the Prefect library such as cloudpickle which often supports more object types than default pickle
.
3. Flow Definition: Look into restructuring your flow to avoid passing complex objects like asyncio.Event
if they are not necessary for serialization across Dask workers.
You may find useful additional information by reading Prefect's Dask integration for 3.x and general serialization documentation.
If this doesn't solve your issue, unfortunately, the GitHub search encountered an authorization error, so I wasn't able to check active issues. You might want to peek into GitHub issues for similar serialization problems or pose questions to the community.
Taking into account complex async objects isn't always straightforward, but you're not alone in this cloud! Who knew that being a high-level graph could be so distressing?