https://prefect.io logo
#prefect-community
Title
# prefect-community
k

Kevin Systrom

04/28/2020, 9:06 PM
Hey folks, two questions: (1) what is a proper way to manually clear a task from the cache between runs? I have a mapped task over many values and sometimes I want to knock out specific values ... (2) I'm passing around dataframes between tasks and I'd like to use an inputs cache_validator - however, this doesn't work because it seems the validator 'can't evaluate the truth of a dataframe'. - is there a better approach so that if I pass in a different dataframe the cache for that task is invalidated?
z

Zachary Hughes

04/28/2020, 9:10 PM
Hi @Kevin Systrom! Getting some best practices input from the team-- will be back to you with an answer in just a sec.
k

Kevin Systrom

04/28/2020, 9:14 PM
thx
z

Zachary Hughes

04/28/2020, 9:23 PM
Okay, back with some guidance! It's worth calling out that this is probably a bit cleaner if you're using an orchestration solution (Prefect Server or Cloud). But we can clear a task's cache even without Server or Cloud. If you're using Core on its own, we'd suggest using a state handler for your flow that looks like the snippet below:
Copy code
def flow_state_handler(flow, old_state, new_state):
    if new_state.is_finished(): # the flow run is completing
        ## check prefect.context and do any cache clearing that you want
If you want to give Cloud or Server a shot, you can manually clear the cache by deleting the old cache state. We're taking a look at the
cache_validator
issue, and it looks like we might have a path forward to solve this. But in the meantime, you should be also able to copy/paste the validator code and create your own that handles dataframes more gracefully.
upvote 1
k

Kevin Systrom

04/28/2020, 9:37 PM
OK thanks, I'll look into both of these. I'm on core right now.
z

Zachary Hughes

04/28/2020, 9:38 PM
Awesome. We're here if you have more questions about this or anything else in Prefect!
c

Chris White

04/28/2020, 9:57 PM
FYI we’ve opened an issue to improve our cache validators so they are more robust to various types of equality checks (dataframes included): https://github.com/PrefectHQ/prefect/issues/2441
upvote 2
@Marvin archive “How to clear task cache between flow runs”
k

Kevin Systrom

04/29/2020, 12:47 AM
Thank you!
👍 1