Hi! I have a Mapped Task that I have checkpointing...
# ask-community
o
Hi! I have a Mapped Task that I have checkpointing setup for (with the
map_index
in the filename so it properly writes out each mapped task to an individual result) within a flow running against Prefect Cloud. I just came across a weird scenario where the Flow did run the Mapped Task fully through (100 Mapped Tasks in total), but noticed afterwards that 7 of them had a status of 'Cached'. This caught my eye - as it should not have loaded any of them from the Cache. When I looked closer at the logs of one of the 'Cached' Mapped Tasks - it looks like it finished successfully, and then restarted ~7 mins later and loaded from Cache. It appears that all data is still there as I expected - but behavior seemed a bit odd. Wondering if anyone else has seen this before? Thanks!
k
Hi @Owen McMahon, how are you caching these? Using targets? Or cache_for + results?
o
hey @Kevin Kho - here's pseudo-code for how we're setting up that task. Using targets.
Copy code
task_a = TaskA(
        log_stdout=True,
        name="TASK NAME",
        target="results/{date:%Y}-{date:%m}-{date:%d}/{task_name}-{map_index}.txt",
        checkpoint=True,
        result=results,
    )
and the results being used are GCSResults
k
Everything looks good. So this is more about Dask choosing to re-run the task (maybe a worker died somewhere and it re-ran all the tasks of that worker). Prefect is then working as intended with not re-running the task and pulling from cache instead.
o
Ahhh. thats a solid explanation - am using Dask Executor so that lines up. thanks!