s

    Scott Moreland

    1 year ago
    What's the best way to ignore an existing checkpoint and overwrite it?
    Chris White

    Chris White

    1 year ago
    Hi Scott - ignoring and overwriting is the default behavior of checkpoints; perhaps you’re asking about the
    target
    keyword?
    s

    Scott Moreland

    1 year ago
    Hm, yea I guess I am getting confused about the relationship between the two. I thought that the target was the location name to be used by the Result class when checkpointing. In my case, I created a custom Results subclass that reads/writes to a database. So I expect the target to set the table name used to write to the database. However when I specify a target and set checkpoint=True, it always reads from the database when the target exists. I'm wondering how I can turn that behavior off to rebuild a checkpoint table in the event that I modify the source code of my task, but not the target table name.
    Chris White

    Chris White

    1 year ago
    Yup I understand your confusion -
    target
    is actually a special keyword that is related to file-based caching (so if the file exists, the task is not rerun); here is some documentation on target: https://docs.prefect.io/core/idioms/targets.html (we definitely need to expand our documentation on results and caching!) What you can do instead is:
    @task(checkpoint=True, result=MyResultType(location="{same-template-you-used-for-your-target}")
    ^^ that will still checkpoint your data to a templated location but will not re-use that data on subsequent runs (and will instead overwrite the data)
    s

    Scott Moreland

    1 year ago
    Hi Chris, still confused here on terminology. When I'm developing and debugging a single task, I want to call its run method and persist the task's output every time to a database. Moreover, when this task requires the output of a previous task as input, I want to load this input value from the persisted state of the previous task (also stored in the database). Furthermore, given two tasks:1. result1 = task1() 2. result2 = task2(result1) It would be nice to be able to run [task1, task2] in sequence and regenerate (and persist to the database) both result1 and result2. Alternatively, if I just want to run task2, I'd like to be able to read resul1 from its persisted state, and run task2 to generate result2 without rerunning task1. Hope that makes sense. Should I still be using location here over target? To be clear I am persisting each task result to a database table using a custom Results class similar to the S3 class.
    I guess TLDR is that non-skipped tasks would regenerate their persisted outputs, while skipped tasks would return their persisted values when used as inputs to non-skipped tasks.
    Chris White

    Chris White

    1 year ago
    I don’t think I’m completely following, so let me just describe these two kwargs to you: -
    checkpoint=True
    + a result
    location
    template (note you can template these locations based on task inputs / timestamps / etc., which is sometimes useful for “overriding”): every time this task runs, it will store it’s output data to the provided location. If you ever want to “rehydrate” an upstream task’s state you’ll have to do this manually using the
    load_result
    method on all
    State
    objects- `target=location_template`: when a
    target
    is provided, the location template is first checked — if data is present at the location, it is used and the task is not rerun. If no data is present, the task runs and stores its output in the provided location. As before, you can template these locations to provide for some interesting functionality. If you ever want to force a rerun of a task, you’ll need to manually delete the data in the location yourself (this is something we do want to support automatically at some point, but it’s still under discussion)
    s

    Scott Moreland

    1 year ago
    Thanks, this clarifies everything for me. Sounds like I could get what I'm looking for using either persistence method with some customization. Appreciate it!
    Chris White

    Chris White

    1 year ago
    Anytime! Glad I could help 🙂
    @Marvin archive “What is the difference between checkpoint and target?”
    Marvin

    Marvin

    1 year ago
    Pedro Machado

    Pedro Machado

    1 year ago
    +1 on being able to force the task(s) to ignore (replace) the cached values without having to manually delete the files.
    Chris White

    Chris White

    1 year ago
    Yea this definitely makes sense; it would be very easy to do for all tasks simultaneously via a special context key / value pair that you could set on a per-run basis. Being able to control this on a task run-by-task run basis would be trickier
    Pedro Machado

    Pedro Machado

    1 year ago
    Having this for all tasks would be a good starting point. I agree that it would be more difficult to do it selectively.