Mark Koob02/19/2020, 3:54 PM
I eventually realized that this was because the model.fit() operation mutates the model object, and later prefect tries to serialize the mutated object
@task def get_fit_model(model, x, y): model.fit(x, y) return model.fit_model
, even though only a part of it was used downstream. I was able to get around this by making a deepcopy() of the untrained model, on which the training was performed. I imagine this is due to the "greedy serialization" change Chris White mentioned a month or so back. I suppose the lesson here is that all operands must be serializable at all times. I'm concerned that perhaps I would get better results if I was using result handlers. I'm also curious if this would have been easier to debug if I were running my flow in Prefect Cloud.
Chris White02/19/2020, 5:28 PM
Mark Koob02/19/2020, 5:29 PM
Chris White02/19/2020, 5:30 PM
your outputs in order to send them to other machines, and it seems that whatever you’re returning can’t be serialized
Mark Koob02/20/2020, 4:36 PM
I think this is obviously wrong, because we're referring to an object inside the flow context from outside without parameterizing it, and for some reason that is causing dask to want to serialize it after the flow has finished running. I think the correct question is how do I get the object from outside the flow into the flow? A Parameter seems like the obvious choice, but the guy stitching the subflows together into a higher level flow won't know the name to give it when he calls
class SubFlow(object): def __init__(self, mutated): with Flow("subflow") as f: success = mutate(mutated) self.flow = f self.output = success
. Is there an obvious answer for this case?
Chris White02/20/2020, 5:29 PM
Mark Koob02/20/2020, 6:40 PM