Marco Barbero Mota
05/18/2023, 7:07 PMMarvin
05/18/2023, 7:09 PMPickleSerializer
, which serializes the output of the task. If you use a custom serializer, the content of the persisted result may vary.
In summary, the primary content of the persisted result is the output of the task, but the exact format and additional information depend on the serializer used.
Now that we've cracked the code, let's get back to orchestrating workflows, shall we? Humans and their fascination with pickles...Nate
05/18/2023, 7:10 PMMarco Barbero Mota
05/18/2023, 7:11 PMpickle.dump
so that the exact variable (dictionary in this case) was also saved to a local file as I normally do. The file that prefect saves is larger and when I try to pickle.load
it it cant be loaded. However, I can do it with the manually saved file.Nate
05/18/2023, 7:14 PMresult_storage_key
(docs) so you have a reference in the future, then calling .get()
on the result should deserialize it for you
can you show your code and what's going wrong?Marvin
05/18/2023, 7:15 PMPickleSerializer
) (GitHub issue #7624). The issue reports that the serialization objects from each object remain in memory, leading to a shift in the buildup.
Another issue is related to an error when aborting or crashing, which states "Got unsupported data document encoding of 'cloudpickle'. Expected 'result'." (GitHub issue #6845). This issue might be related to the problem you're experiencing when trying to load the Prefect-generated pickle file.
Unfortunately, I couldn't find a direct solution to your problem. However, you might want to consider using a custom serializer for your task results to avoid potential issues with the default PickleSerializer
. This way, you can have more control over the serialization and deserialization process, ensuring compatibility with your manual loading process.
Remember, life is full of mysteries, and sometimes, so are pickle files.Marco Barbero Mota
05/18/2023, 7:18 PM@task(cache_key_fn=cache_file_based_fn,
persist_result=True,
result_storage=LocalFileSystem(basepath='~/results'),
result_storage_key='results_A.pkl',
result_serializer="pickle",
cache_result_in_memory=False)
def my_task():
result = {
'a':np.array([1,1]),
'b':np.array[2,2]()
}
with open('~/results/results_B.pkl', 'wb') as file:
# Serialize and save the dictionary to the pickle file
pickle.dump(result, file)
return result
@flow
def pipeline():
my_task()
pipeline()
load.pickle()
to retrieve the result
dictionary when I do so for ~/results/result_B.pkl
however when doing so for ~/results/result_A
there is an error.Nate
05/18/2023, 7:23 PMresult_A
?Marco Barbero Mota
05/18/2023, 7:24 PMimport pickle
with open('~/results/result_A.pkl',"rb") as file:
object = pickle.load(file)
UnpicklingError: invalid load key '{'
Nate
05/18/2023, 7:33 PMresult_serializer
has to be a Serializer
type, not a string
since the string "pickle" is not a serializer, its probably defaulting to the JSONSerializer since your result
is JSONresult_serializer=PickleSerializer()
instead, where
from prefect.serializers import PickleSerializer
Marco Barbero Mota
05/18/2023, 7:39 PMPickleSerializer
.It is the Literal string defined in that class.Nate
05/18/2023, 7:52 PMIn [17]: @task(persist_result=True, result_storage_key="test.pkl")
...: def test_task():
...: return {"a": 1}
...:
In [18]: @flow
...: def testing():
...: test_task()
...:
In [19]: testing()
Out[19]: [Completed(message=None, type=COMPLETED, result=PersistedResult(type='reference', artifact_type='result', artifact_description='Result of type `dict` persisted to: `/Users/nate/.prefect/storage/test.pkl`', serializer_type='pickle', storage_block_id=UUID('1ea3ffa6-d603-44f5-af99-223b108f266a'), storage_key='test.pkl'))]
In [20]: !cat /Users/nate/.prefect/storage/test.pkl
{"serializer": {"type": "pickle", "picklelib": "cloudpickle", "picklelib_version": "2.2.1"}, "data": "gAWVCgAAAAAAAAB9lIwBYZRLAXMu\n", "prefect_version": "2.10.9"}
In [27]: import json
...:
...: with open("/Users/nate/.prefect/storage/test.pkl", 'r') as f:
...: print(json.loads(f.read())["data"])
...:
gAWVCgAAAAAAAAB9lIwBYZRLAXMu
Marco Barbero Mota
05/18/2023, 7:57 PMNate
05/18/2023, 8:10 PMMarco Barbero Mota
05/18/2023, 8:14 PMNate
05/18/2023, 8:30 PMIn [51]: import base64
from pathlib import Path
from prefect.results import PersistedResultBlob
In [52]: pickle.loads(
base64.b64decode(
PersistedResultBlob.parse_raw(
Path("~/.prefect/storage/test.pkl").read_bytes()
).data
)
)
Out[52]: {'a': 1}
Marco Barbero Mota
05/18/2023, 8:31 PMNate
05/18/2023, 8:36 PMenchancement
) describing the complexity you ran into?Marco Barbero Mota
05/18/2023, 8:36 PMNate
05/18/2023, 8:37 PMπ Propose a feature enhancement
and fill out the form!
it would be helpful to show what you tried, what didn't work for you, and then what you had to do in order to make it work, and explain why it could have been easier. If you have any suggestions on implementation, you can put them in the Describe the proposed behavior
section πMarco Barbero Mota
05/18/2023, 8:44 PMNate
05/18/2023, 8:48 PMMarco Barbero Mota
05/18/2023, 9:38 PMNate
05/18/2023, 9:38 PMMarco Barbero Mota
05/18/2023, 9:40 PMNate
05/18/2023, 9:43 PM