I'm encountering a persistent issue with Dask seri...
# ask-community
r
I'm encountering a persistent issue with Dask serialization when updating my flows from Prefect
2.19.1
to Prefect
3.1.2
. Every run incurs the following error, which I've never seen before in 18 months of using Prefect 2:
Copy code
Task run failed with exception: TypeError('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 1 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7f4e3b547400>\n 0. 139974151916736\n>') - Retries are exhausted
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/prefect/docker/__init__.py", line 20, in __getattr__
    raise ImportError(f"module {__name__!r} has no attribute {name!r}")
ImportError: module 'prefect.docker' has no attribute 'isnan'

<---Long series of stack traces--->

The above exception was the direct cause of the following exception:

<<---Long stack trace--->>

  File "/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py", line 392, in serialize
    raise TypeError(msg, str_x) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 1 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7f4e3b547400>\n 0. 139974151916736\n>')
All of my flows write to Zarr N-Dimensional data stores using Xarray's
to_zarr
method,.
to_zarr
leverages Dask under the hood to vastly speed up this operation. I don't use Dask Task Runners or any specialized Prefect Dask objects. Digging around I've found this conversation, which doesn't seem directly related https://linen.prefect.io/t/23198687/hello-everyone-i-m-facing-an-issue-using-prefect-with-a-dask. Any ideas Prefect team? I'll post a full stack trace in follow up
Full ugly stack trace included as a file to keep this thread readable
n
hi @Robert Banick this is similar to a problem i saw recently
r
I do use
task_run_name
on one of the Flow's constituent Task's, although that's not the task that fails. It's non-essential and I can take it out for troubleshooting
👍 1
n
i think in the above case, the user had @flow on an instance method where the instance had an unserializable type on it
I do use
task_run_name
on one of the Flow's constituent Task's, although that's not the task that fails. It's non-essential and I can take it out for troubleshooting
thats probably not the issue then
r
Yeah I didn't think
task_run_name
was the issue either
n
yeah my bad i was thinking of this, but the discussion I linked is likely way more relevant
r
It's a pretty dense discussion, let me review
I don't use a task runner like that
n
are you using a flow decorator on an instance method at all? or just pure functions what im driving at is that at some point we're trying to serialize an unserializable thing
however this is real strange
File "/usr/local/lib/python3.10/dist-packages/prefect/docker/__init__.py", line 20, in getattr
raise ImportError(f"module {name!r} has no attribute {name!r}")
ImportError: module 'prefect.docker' has no attribute 'isnan'
r
Yep I'm thinking about this too
I use a flow decorator in one place and one place only, around a master function called
gridded_etl_template
The
prefect.docker
thing also has me confused. I'm running this flow inside a docker container deployed onto an AWS Fargate box. Is there any way there's a version mismatch inside the docker container causing obscure
prefect.docker
problems?
I'm using
Copy code
dask[array,diagnostics,distributed]==2024.8.0
xarray[complete]==2024.3.0
zarr==2.18.3
In case that matters
Regarding the use of flow, the
gridded_etl_template
executes the steps of our normal ETL pipeline on a specific class corresponding to a dataset. That class contains a host of instance methods.
n
that should be fine one thing im thinking
which I've never seen before in 18 months of using Prefect 2:
r
Clearly any number of objects in the instance could be unserializable but we haven't had issues of this sort in a long time.
n
do you have a list of futures that you are not resolving or returning from a flow? this was a big change in prefect 2 to 3, we will no longer automatically resolve futures if you dont return them or pass them to another task
r
No we don't (knowingly) use Dask futures anywhere in our codebase. It's theroetically possible they're baked into Xarray somewhere, it would be quite a slog to dig for that.
Sorry I read futures and thought Dask. We don't use any
async
code within our ETLs so the answer for async futures is also a no.