What s the best way to ignore an existing checkpoint and ove Prefect Community #ask-community

Join Slack

What's the best way to ignore an existing checkpoi...

# ask-community

Scott Moreland

11/27/2020, 1:13 PM

What's the best way to ignore an existing checkpoint and overwrite it?

Chris White

11/28/2020, 9:51 PM

Hi Scott - ignoring and overwriting is the default behavior of checkpoints; perhaps you’re asking about the

target

keyword?

Scott Moreland

11/30/2020, 2:09 PM

Hm, yea I guess I am getting confused about the relationship between the two. I thought that the target was the location name to be used by the Result class when checkpointing. In my case, I created a custom Results subclass that reads/writes to a database. So I expect the target to set the table name used to write to the database. However when I specify a target and set checkpoint=True, it always reads from the database when the target exists. I'm wondering how I can turn that behavior off to rebuild a checkpoint table in the event that I modify the source code of my task, but not the target table name.

Chris White

11/30/2020, 4:20 PM

Yup I understand your confusion -

target

is actually a special keyword that is related to file-based caching (so if the file exists, the task is not rerun); here is some documentation on `target`: https://docs.prefect.io/core/idioms/targets.html (we definitely need to expand our documentation on results and caching!) What you can do instead is:

Copy code

@task(checkpoint=True, result=MyResultType(location="{same-template-you-used-for-your-target}")

^^ that will still checkpoint your data to a templated location but will not re-use that data on subsequent runs (and will instead overwrite the data)

Scott Moreland

11/30/2020, 6:21 PM

Hi Chris, still confused here on terminology. When I'm developing and debugging a single task, I want to call its run method and persist the task's output every time to a database. Moreover, when this task requires the output of a previous task as input, I want to load this input value from the persisted state of the previous task (also stored in the database). Furthermore, given two tasks: 1. result1 = task1() 2. result2 = task2(result1) It would be nice to be able to run [task1, task2] in sequence and regenerate (and persist to the database) both result1 and result2. Alternatively, if I just want to run task2, I'd like to be able to read resul1 from its persisted state, and run task2 to generate result2 without rerunning task1. Hope that makes sense. Should I still be using location here over target? To be clear I am persisting each task result to a database table using a custom Results class similar to the S3 class.

Scott Moreland

11/30/2020, 6:24 PM

I guess TLDR is that non-skipped tasks would regenerate their persisted outputs, while skipped tasks would return their persisted values when used as inputs to non-skipped tasks.

Chris White

11/30/2020, 8:41 PM

I don’t think I’m completely following, so let me just describe these two kwargs to you: -

checkpoint=True

+ a result

location

template (note you can template these locations based on task inputs / timestamps / etc., which is sometimes useful for “overriding”): every time this task runs, it will store it’s output data to the provided location. If you ever want to “rehydrate” an upstream task’s state you’ll have to do this manually using the

load_result

method on all

State

objects - `target=location_template`: when a

target

is provided, the location template is first checked — if data is present at the location, it is used and the task is not rerun. If no data is present, the task runs and stores its output in the provided location. As before, you can template these locations to provide for some interesting functionality. If you ever want to force a rerun of a task, you’ll need to manually delete the data in the location yourself (this is something we do want to support automatically at some point, but it’s still under discussion)

Scott Moreland

11/30/2020, 9:55 PM

Thanks, this clarifies everything for me. Sounds like I could get what I'm looking for using either persistence method with some customization. Appreciate it!

Chris White

11/30/2020, 10:09 PM

Anytime! Glad I could help 🙂

Chris White

11/30/2020, 10:10 PM

@Marvin archive “What is the difference between checkpoint and target?”

Marvin

11/30/2020, 10:10 PM

https://github.com/PrefectHQ/prefect/issues/3739

Pedro Machado

12/01/2020, 5:11 AM

+1 on being able to force the task(s) to ignore (replace) the cached values without having to manually delete the files.

Chris White

12/01/2020, 5:16 AM

Yea this definitely makes sense; it would be very easy to do for all tasks simultaneously via a special context key / value pair that you could set on a per-run basis. Being able to control this on a task run-by-task run basis would be trickier

Pedro Machado

12/01/2020, 5:47 AM

Having this for all tasks would be a good starting point. I agree that it would be more difficult to do it selectively.

3 Views

Open in Slack

Previous Next