Has anyone run into any problems reading a result written wi Prefect Community #ask-community

Has anyone run into any problems reading a result ...

Ayla Khan

07/15/2021, 9:23 PM

Has anyone run into any problems reading a result written with a serializer other than the default pickle serializer? I set the result for a task to use GCSResult with the JSONSerializer. When I try to read the result from task runs obtained from

Client.get_flow_run_info(..)

, the read function is using the default pickle serializer. I used prefect 0.15.1 in both the flow environment and to run the read code. Am I missing something? Thank you in advance!

Copy code

result = GCSResult(
    bucket="prefect-cloud-results",
    location="{flow_name}/{flow_run_id}/provenance.json",
    serializer=JSONSerializer()
)
@task(result=result)
def set_provenance_data(flow_run_id: str, prefect_cloud_client: prefect.client.client.Client = None):
...

Kevin Kho

07/15/2021, 9:28 PM

Hey @Ayla Khan , I have seen this happen specifically when the serialization and deserialization happened with different Python versions. Both Python versions and Prefect versions (a bit less on this one) need to match on both ends. Are you using a container that has a different Python version (3.6, 3.7, 3.8)?

Ayla Khan

07/15/2021, 9:31 PM

Yes, they're different. Thank you - I'll try a container with a different Python version and see if that fixes it!

Ayla Khan

07/16/2021, 5:40 AM

I tried running deserialization in a container running from the same Prefect Docker image (

prefecthq/prefect:0.15.1-python3.6

) as I used to run the flow that serialized the result. Still seeing the error

Kevin Kho

07/16/2021, 1:55 PM

Could you give me your

cloudpickle

version? I’ll make an environment and test this

Ayla Khan

07/16/2021, 3:24 PM

Thank you!

Copy code

root@893f541d47cb:/# pip list | grep cloudpickle
cloudpickle                 1.6.0

Kevin Kho

07/16/2021, 3:26 PM

Will try to replicate sometime today. Feel free to ping me if I don’t get back you by tom. I just have a bunch of stuff today.

Ayla Khan

07/16/2021, 4:14 PM

ok, sounds good

Kevin Kho

07/19/2021, 5:48 AM

I know what you are saying now. I can replicate that error when I serialize with JSON and read with PickleSerializer. Did you attach the Result to the class or use it inside the task? Did you attach it to the Flow?

🙏 1

Ayla Khan

07/19/2021, 3:46 PM

I want to return the result from the task and use it (attach I guess?) inside a Flow

Kevin Kho

07/19/2021, 3:54 PM

I mean did you do

@task(result = xxx)

Flow(result = xxx)

Copy code

@task
def abc():
    result.write(xxx)

Wondering how the result was defined.

Ayla Khan

07/19/2021, 5:03 PM

Ah, ok! I'm doing

@task(result = xxx)

Kevin Kho

07/20/2021, 3:31 PM

So I talked to the team. The serializer is not included in the result by design. It is loaded in when the Task runs, so it won’t work for the script that you posted originally. What are you trying to do?

Ayla Khan

07/20/2021, 8:10 PM

I have some data about flow runs that for ergonomic reasons, is better to serialize as JSON instead of pickling, and save that JSON file to a GCS bucket. It's also convenient to have this data available to query through

Client.get_flow_run_info

and the task run data that gets returned, which is what I was testing in the script snippet I posted. I have a workaround by calling raw google storage code directly, but it would be nice to be able to get that data by reading from the result. Is there any way to specify what serializer a result should use when reading a value from a previously run task state's result? Would I set a Result on a task or flow that's instantiated with the serializer the result should use and then call the result read function?

Kevin Kho

07/20/2021, 8:14 PM

Yeah so I was thinking maybe you should just instantiate the Result manually like

res = GCSResult(bucket="xxx", serializer=JSONSerializer())

, and then you can do

res.read(location)

Ayla Khan

07/20/2021, 8:20 PM

Ok, I'll give that a try later today. Thank you!

Ayla Khan

07/20/2021, 10:59 PM

Yes, that'll work and its cleaner than using raw GCS code in a task. Thank you again for your help

Kevin Kho

07/20/2021, 11:01 PM

Thanks for your understanding!

5 Views

Open in Slack

Previous Next