as
10/14/2020, 11:43 AMjosh
10/14/2020, 12:59 PMtarget
it will check for the existence of the data and if it exists it will not run the task and instead the downstream will use the data at that location: https://docs.prefect.io/core/idioms/targets.htmlas
10/14/2020, 1:35 PMif it exists it will not run the task and instead the downstream will use the data at that location
What do you mean? It will pass the path to that location or it will try to read the data at the location with the specified serializer and pass that? The latter will not work with not serialised data at the target (eg the onnx file in the example below)
An example is the model2onnxTask task in the snippet below:
## result is a string that is the path to the keras model. Here the path to the keras model is saved in the json file at the target location
keras_model_path = task(
train_keras_model,
result=LocalResult(serializer=JSONSerializer()),
target=join(output_root, ".prefect", "{task_name}.json"),
checkpoint=True,
)()
## subclassed shelltask that accepts parameters. This shell scripts generates an onnx file (and returns the path to that onnx file)
onnx_model = model2onnxTask(target=onnx_path)(
keras_model=keras_model_path, onnx_model=onnx_path
)
## I would like to pass the path to the onnx model to this task.
out = task(process_onnx)(onnx_model)
josh
10/14/2020, 1:38 PM@task(target="test.json", result=LocalResult(serializer=JSONSerializer()))
def get_data():
return {"asdf": "here"}
@task
def print_data(d):
print(d)
with Flow("target_serializer") as f:
d = get_data()
print_data(d)
get_data
will write json data to test.json
on first run and print_data
will use it. On second run get_data
will see that data exists at test.json
, enter a Cached
state, and then get_data
will read the json data from that test.json
locationas
10/14/2020, 2:31 PMjosh
10/14/2020, 2:33 PM