Mark Koob
02/19/2020, 3:54 PM@task
def get_fit_model(model, x, y):
model.fit(x, y)
return model.fit_model
I eventually realized that this was because the model.fit() operation mutates the model object, and later prefect tries to serialize the mutated object model
, even though only a part of it was used downstream. I was able to get around this by making a deepcopy() of the untrained model, on which the training was performed. I imagine this is due to the "greedy serialization" change Chris White mentioned a month or so back. I suppose the lesson here is that all operands must be serializable at all times.
I'm concerned that perhaps I would get better results if I was using result handlers. I'm also curious if this would have been easier to debug if I were running my flow in Prefect Cloud.Chris White
Mark Koob
02/19/2020, 5:29 PMChris White
cloudpickle
your outputs in order to send them to other machines, and it seems that whatever you’re returning can’t be serializedMark Koob
02/20/2020, 4:36 PMclass SubFlow(object):
def __init__(self, mutated):
with Flow("subflow") as f:
success = mutate(mutated)
self.flow = f
self.output = success
I think this is obviously wrong, because we're referring to an object inside the flow context from outside without parameterizing it, and for some reason that is causing dask to want to serialize it after the flow has finished running.
I think the correct question is how do I get the object from outside the flow into the flow? A Parameter seems like the obvious choice, but the guy stitching the subflows together into a higher level flow won't know the name to give it when he calls composite_flow.run()
. Is there an obvious answer for this case?Chris White
Mark Koob
02/20/2020, 6:40 PM