Hi! I have a Mapped Task that I have checkpointing setup for (with the
map_index
in the filename so it properly writes out each mapped task to an individual result) within a flow running against Prefect Cloud.
I just came across a weird scenario where the Flow did run the Mapped Task fully through (100 Mapped Tasks in total), but noticed afterwards that 7 of them had a status of 'Cached'. This caught my eye - as it should not have loaded any of them from the Cache. When I looked closer at the logs of one of the 'Cached' Mapped Tasks - it looks like it finished successfully, and then restarted ~7 mins later and loaded from Cache.
It appears that all data is still there as I expected - but behavior seemed a bit odd. Wondering if anyone else has seen this before?
Thanks!
k
Kevin Kho
09/13/2021, 7:51 PM
Hi @Owen McMahon, how are you caching these? Using targets? Or cache_for + results?
o
Owen McMahon
09/13/2021, 7:58 PM
hey @Kevin Kho - here's pseudo-code for how we're setting up that task. Using targets.
Everything looks good. So this is more about Dask choosing to re-run the task (maybe a worker died somewhere and it re-ran all the tasks of that worker). Prefect is then working as intended with not re-running the task and pulling from cache instead.
o
Owen McMahon
09/13/2021, 8:04 PM
Ahhh. thats a solid explanation - am using Dask Executor so that lines up.
thanks!
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.