https://prefect.io logo
a

Andreas Tsangarides

01/25/2022, 10:55 AM
hi all, trying to schedule ML predictions using prefect. this involves loading a pre trained pickled ML model, refitting, and saving new model daily. The very first model can be saved either using a pickle/joblib backend I know how to write manually the code for reading/writing the model locally/to s3, but has anyone done this using
LocalResult
and
S3Result
?
MLPIckleSerialzer.py
Copy code
File "/Users/tsangis/Projects/uk-prefect-flows/.venv/lib/python3.8/site-packages/prefect/engine/task_runner.py", line 876, in get_task_run_state
    value = prefect.utilities.executors.run_task_with_timeout(
  File "/Users/tsangis/Projects/uk-prefect-flows/.venv/lib/python3.8/site-packages/prefect/utilities/executors.py", line 454, in run_task_with_timeout
    return task.run(*args, **kwargs)  # type: ignore
  File "/Users/tsangis/Projects/uk-prefect-flows/src/flows/price_predictions/day_ahead_hourly/tasks.py", line 80, in update_model
    model = latest_ml_model.read(latest_ml_model_loc).value
  File "/Users/tsangis/Projects/uk-prefect-flows/.venv/lib/python3.8/site-packages/prefect/engine/results/local_result.py", line 86, in read
    new.value = self.serializer.deserialize(value)
  File "/Users/tsangis/Projects/uk-prefect-flows/src/common/results/serializers.py", line 83, in deserialize
    return pickle.load(value)
TypeError: file must have 'read' and 'readline' attributes
a

Anna Geller

01/25/2022, 10:59 AM
Good question, have you heard about targets in Prefect? This seems like a great fit for your use case. You could template the target based on the date, task name, etc. and simply returning your model would pickle it into the specified location. Here are more details: • https://docs.prefect.io/core/idioms/targets.htmlhttps://docs.prefect.io/core/concepts/persistence.html#output-caching-based-on-a-file-target
a

Andreas Tsangarides

01/25/2022, 11:08 AM
Hey Anna, hope you are well! yeah already using those, and that's why I want to make use of the result objects it's the serializer I am having trouble with I think if the read method was missing from the local result it would work
Copy code
# local_result.py
with open(os.path.join(self.dir, location), "rb") as f:
    value = f.read()
a

Anna Geller

01/25/2022, 11:15 AM
Can you use the default PickleSerializer? It uses cloudpickle which is better than pickle because it can serialize a function or class by value, whereas pickle can only serialize it by reference.
a

Andreas Tsangarides

01/25/2022, 12:03 PM
in any case someones faces this. It was the read method, just need to convert bytes back to a buffer
Copy code
class MLPickleSerializer(Serializer):
    """custom serializer for saving/retrieving Ml models"""

    def deserialize(self, value: bytes) -> Any:
        # recover a Python object from bytes
        return pickle.load(io.BytesIO(value))
a

Anna Geller

01/25/2022, 12:13 PM
I’m not sure whether this is needed and when. LMK if you have any specific question about that. Usually the default PickleSerializer works well to pickle arbitrary python objects incl. ML models.
m

Marco Barbero Mota

05/17/2023, 6:31 PM
Following on this, would this approach work for dictionaries?
I have found that the default pickle serializer does save
.pkl
files that cant be read when attempting using the module
pickle
or
pandas.read_pickle
a

Andreas Tsangarides

05/25/2023, 9:23 AM
hey Marco! Migrating to Prefect 2 and using `Block`s for this kind of thing now
🙌 1
3 Views