Hey all - I have a question about idempotency. I’m...
# prefect-community
Hey all - I have a question about idempotency. I’m just getting started using prefect (both core & cloud) and am most familiar with Luigi for managing data workflows. Luigi has a concept of a
for outputs, and if that Target exists (you get to define “exists”), the task won’t run and will return the already-completed result. Is there something similar for Prefect, or is this something I would have to implement manually? Cacheing seems like it would work, but does that cache persist across runs, or if different `Flow`s are using the same
? As a concrete example - I’m writing a single row per day into a psql database. The day is defined by a
. If I run my Flow repeatedly, it keeps inserting rows for the same date. Instead, I’d like to check for existence of the row with that date and return with the appropriate State (probably
Hey @Adam! Yea, Prefect caching will satisfy the same pattern that you describe; in Prefect Core, currently the cache is memory-based (and so won’t necessarily persist across runs unless you extract it and save it yourself), but in Prefect Cloud it naturally will. For sharing a cache across tasks / flows, check out the
keyword argument to tasks (more on this here: https://docs.prefect.io/core/concepts/persistence.html#persistence-and-caching) For your example, you would use the
cache validator on whatever task performs the write. Let me know if you have any follow-up questions or need any clarifications!
Great, that makes sense. One other follow-up: I assume that also means that if my output was to disappear (whether it be a row in a database or s3/gcs bucket or something else) - Prefect Cloud wouldn’t know that because I’m guessing it just keeps track of the metadata that this combination of parameters resulted in a successful run. Is that right? If so, is there a way to force it to do a backfill?
Yea, so in that case I recommend either: - customizing one of Prefect’s cache validators to perform that check at runtime (the cache validator that you choose fully controls whether the cache is respected or not). Right now all of the off-the-shelf validators are based entirely on input / parameter checks - creating a Task which performs the check and returns a distinct value depending on whether the “thing” exists or not, and then use a cache validator which checks the value of this input from run to run
This is great. Thanks! Now go enjoy your Sunday evening :)
🏖️ 1
💯 1