Thread
#prefect-community
    a

    as

    1 year ago
    HI, I was wondering what is supposed to happen when a the target parameter is set but not the checkpoint parameter. The target is here set to the path of a file that is produced by the task (onnx file in this case) the result to the task is a string with the file path to that object. Since checkpointing is not defined, the result of the task is not saved but the task is not run because the target exists. So is there a conflict here? If the target exists, how is the result passed on to the next task in the flow? This is a bit confusing to me. Can somebody explain to me what happens in this situation? Thanks EDIT: I just figured out that the object generated by the task (of which the path is specified at the target parameter) is overwritten with the result of the task. Ideally I would expect to be able to somehow define an artifact generated by the task as a target. If the object exits do not redo the task and pass on the path to the object to the next step in the flow. Is this something that would make sense? Or does this allready exist and I am doing it wrong?
    j

    josh

    1 year ago
    Hi @as could you share a snippet of your task? Currently when using a
    target
    it will check for the existence of the data and if it exists it will not run the task and instead the downstream will use the data at that location: https://docs.prefect.io/core/idioms/targets.html
    a

    as

    1 year ago
    if you say
    if it exists it will not run the task and instead the downstream will use the data at that location
    What do you mean? It will pass the path to that location or it will try to read the data at the location with the specified serializer and pass that? The latter will not work with not serialised data at the target (eg the onnx file in the example below) An example is the model2onnxTask task in the snippet below:
    ## result is a string that is the path to the keras model. Here the path to the keras model is saved in the json file at the target location
    keras_model_path = task(
        train_keras_model,
        result=LocalResult(serializer=JSONSerializer()),
        target=join(output_root, ".prefect", "{task_name}.json"),
        checkpoint=True,
    )()
    
    ## subclassed shelltask that accepts parameters. This shell scripts generates an onnx file (and returns the path to that onnx file)
    onnx_model = model2onnxTask(target=onnx_path)(
        keras_model=keras_model_path, onnx_model=onnx_path
    )
    
    ## I would like to pass the path to the onnx model to this task.
    out = task(process_onnx)(onnx_model)
    j

    josh

    1 year ago
    When using a target the task will run and write some data at that target. But before it does that it will first check the existence of the data at that target location. If the data is there the task will enter a Cached state and then the downstream task will read the data from that upstream result’s location (which is the target location)
    Example flow:
    @task(target="test.json", result=LocalResult(serializer=JSONSerializer()))
    def get_data():
        return {"asdf": "here"}
    
    @task
    def print_data(d):
        print(d)
    
    with Flow("target_serializer") as f:
        d = get_data()
        print_data(d)
    get_data
    will write json data to
    test.json
    on first run and
    print_data
    will use it. On second run
    get_data
    will see that data exists at
    test.json
    , enter a
    Cached
    state, and then
    get_data
    will read the json data from that
    test.json
    location
    a

    as

    1 year ago
    Thanks for the answer! So in my case, I should really only try to save the result of the task to the target (the onnx model path string as a pickle/ json file.) And not the actual onnx object (which is produced as a side effect of the task)? So there is currently no way to achieve what I was trying to do. (using the onnx file as a target and passing on the file location in the flow)
    j

    josh

    1 year ago
    Yes I believe that is correct. If the onnx object is a side effect of the task then your best bet would be to return the location of that onnx object that the downstream task can read it from