Andreas Tsangarides

    Andreas Tsangarides

    7 months ago
    hi all, trying to schedule ML predictions using prefect. this involves loading a pre trained pickled ML model, refitting, and saving new model daily. The very first model can be saved either using a pickle/joblib backend I know how to write manually the code for reading/writing the model locally/to s3, but has anyone done this using
    LocalResult
    and
    S3Result
    ?
    File "/Users/tsangis/Projects/uk-prefect-flows/.venv/lib/python3.8/site-packages/prefect/engine/task_runner.py", line 876, in get_task_run_state
        value = prefect.utilities.executors.run_task_with_timeout(
      File "/Users/tsangis/Projects/uk-prefect-flows/.venv/lib/python3.8/site-packages/prefect/utilities/executors.py", line 454, in run_task_with_timeout
        return task.run(*args, **kwargs)  # type: ignore
      File "/Users/tsangis/Projects/uk-prefect-flows/src/flows/price_predictions/day_ahead_hourly/tasks.py", line 80, in update_model
        model = latest_ml_model.read(latest_ml_model_loc).value
      File "/Users/tsangis/Projects/uk-prefect-flows/.venv/lib/python3.8/site-packages/prefect/engine/results/local_result.py", line 86, in read
        new.value = self.serializer.deserialize(value)
      File "/Users/tsangis/Projects/uk-prefect-flows/src/common/results/serializers.py", line 83, in deserialize
        return pickle.load(value)
    TypeError: file must have 'read' and 'readline' attributes
    Anna Geller

    Anna Geller

    7 months ago
    Good question, have you heard about targets in Prefect? This seems like a great fit for your use case. You could template the target based on the date, task name, etc. and simply returning your model would pickle it into the specified location. Here are more details: • https://docs.prefect.io/core/idioms/targets.html https://docs.prefect.io/core/concepts/persistence.html#output-caching-based-on-a-file-target
    Andreas Tsangarides

    Andreas Tsangarides

    7 months ago
    Hey Anna, hope you are well! yeah already using those, and that's why I want to make use of the result objects it's the serializer I am having trouble with I think if the read method was missing from the local result it would work
    # local_result.py
    with open(os.path.join(self.dir, location), "rb") as f:
        value = f.read()
    Anna Geller

    Anna Geller

    7 months ago
    Can you use the default PickleSerializer? It uses cloudpickle which is better than pickle because it can serialize a function or class by value, whereas pickle can only serialize it by reference.
    Andreas Tsangarides

    Andreas Tsangarides

    7 months ago
    in any case someones faces this. It was the read method, just need to convert bytes back to a buffer
    class MLPickleSerializer(Serializer):
        """custom serializer for saving/retrieving Ml models"""
    
        def deserialize(self, value: bytes) -> Any:
            # recover a Python object from bytes
            return pickle.load(io.BytesIO(value))
    Anna Geller

    Anna Geller

    7 months ago
    I’m not sure whether this is needed and when. LMK if you have any specific question about that. Usually the default PickleSerializer works well to pickle arbitrary python objects incl. ML models.