https://prefect.io logo
a

as

10/14/2020, 11:43 AM
HI, I was wondering what is supposed to happen when a the target parameter is set but not the checkpoint parameter. The target is here set to the path of a file that is produced by the task (onnx file in this case) the result to the task is a string with the file path to that object. Since checkpointing is not defined, the result of the task is not saved but the task is not run because the target exists. So is there a conflict here? If the target exists, how is the result passed on to the next task in the flow? This is a bit confusing to me. Can somebody explain to me what happens in this situation? Thanks EDIT: I just figured out that the object generated by the task (of which the path is specified at the target parameter) is overwritten with the result of the task. Ideally I would expect to be able to somehow define an artifact generated by the task as a target. If the object exits do not redo the task and pass on the path to the object to the next step in the flow. Is this something that would make sense? Or does this allready exist and I am doing it wrong?
j

josh

10/14/2020, 12:59 PM
Hi @as could you share a snippet of your task? Currently when using a
target
it will check for the existence of the data and if it exists it will not run the task and instead the downstream will use the data at that location: https://docs.prefect.io/core/idioms/targets.html
a

as

10/14/2020, 1:35 PM
if you say
if it exists it will not run the task and instead the downstream will use the data at that location
What do you mean? It will pass the path to that location or it will try to read the data at the location with the specified serializer and pass that? The latter will not work with not serialised data at the target (eg the onnx file in the example below) An example is the model2onnxTask task in the snippet below:
Copy code
## result is a string that is the path to the keras model. Here the path to the keras model is saved in the json file at the target location
keras_model_path = task(
    train_keras_model,
    result=LocalResult(serializer=JSONSerializer()),
    target=join(output_root, ".prefect", "{task_name}.json"),
    checkpoint=True,
)()

## subclassed shelltask that accepts parameters. This shell scripts generates an onnx file (and returns the path to that onnx file)
onnx_model = model2onnxTask(target=onnx_path)(
    keras_model=keras_model_path, onnx_model=onnx_path
)

## I would like to pass the path to the onnx model to this task.
out = task(process_onnx)(onnx_model)
j

josh

10/14/2020, 1:38 PM
When using a target the task will run and write some data at that target. But before it does that it will first check the existence of the data at that target location. If the data is there the task will enter a Cached state and then the downstream task will read the data from that upstream result’s location (which is the target location)
Example flow:
Copy code
@task(target="test.json", result=LocalResult(serializer=JSONSerializer()))
def get_data():
    return {"asdf": "here"}

@task
def print_data(d):
    print(d)

with Flow("target_serializer") as f:
    d = get_data()
    print_data(d)
get_data
will write json data to
test.json
on first run and
print_data
will use it. On second run
get_data
will see that data exists at
test.json
, enter a
Cached
state, and then
get_data
will read the json data from that
test.json
location
a

as

10/14/2020, 2:31 PM
Thanks for the answer! So in my case, I should really only try to save the result of the task to the target (the onnx model path string as a pickle/ json file.) And not the actual onnx object (which is produced as a side effect of the task)? So there is currently no way to achieve what I was trying to do. (using the onnx file as a target and passing on the file location in the flow)
j

josh

10/14/2020, 2:33 PM
Yes I believe that is correct. If the onnx object is a side effect of the task then your best bet would be to return the location of that onnx object that the downstream task can read it from