Let’s say you have a task that takes a few task outputs A, B, and C, runs some ETL on them, and writes them to a S3 location D. Only task D writes to S3 location D. Let’s assume tasks A, B, and C are “expensive” to compute, so you use Prefect’s checkpointing to write their results to S3. One day, you realize you made a big ol’ mistake and the logic in task D has some bug. You’d like to re-run the flow for the last week to overwrite results in S3 location D using the new and improved task D.
Here, I’d want:
• Tasks A, B, and C to use the cached results (they’re expensive computations).
• Task D to not use the cached result, but write the new (fixed) result to S3.
Does that make sense?