as
10/14/2020, 11:43 AMjosh
10/14/2020, 12:59 PMtarget it will check for the existence of the data and if it exists it will not run the task and instead the downstream will use the data at that location: https://docs.prefect.io/core/idioms/targets.htmlas
10/14/2020, 1:35 PMif it exists it will not run the task and instead the downstream will use the data at that location What do you mean? It will pass the path to that location or it will try to read the data at the location with the specified serializer and pass that? The latter will not work with not serialised data at the target (eg the onnx file in the example below)
An example is the model2onnxTask task in the snippet below:
## result is a string that is the path to the keras model. Here the path to the keras model is saved in the json file at the target location
keras_model_path = task(
train_keras_model,
result=LocalResult(serializer=JSONSerializer()),
target=join(output_root, ".prefect", "{task_name}.json"),
checkpoint=True,
)()
## subclassed shelltask that accepts parameters. This shell scripts generates an onnx file (and returns the path to that onnx file)
onnx_model = model2onnxTask(target=onnx_path)(
keras_model=keras_model_path, onnx_model=onnx_path
)
## I would like to pass the path to the onnx model to this task.
out = task(process_onnx)(onnx_model)josh
10/14/2020, 1:38 PMjosh
10/14/2020, 1:44 PM@task(target="test.json", result=LocalResult(serializer=JSONSerializer()))
def get_data():
return {"asdf": "here"}
@task
def print_data(d):
print(d)
with Flow("target_serializer") as f:
d = get_data()
print_data(d)
get_data will write json data to test.json on first run and print_data will use it. On second run get_data will see that data exists at test.json, enter a Cached state, and then get_data will read the json data from that test.json locationas
10/14/2020, 2:31 PMjosh
10/14/2020, 2:33 PM