Hey all I have a question about idempotency I m just getting Prefect Community #ask-community

Hey all - I have a question about idempotency. I’m...

Adam

12/29/2019, 11:21 PM

Hey all - I have a question about idempotency. I’m just getting started using prefect (both core & cloud) and am most familiar with Luigi for managing data workflows. Luigi has a concept of a

Target

for outputs, and if that Target exists (you get to define “exists”), the task won’t run and will return the already-completed result. Is there something similar for Prefect, or is this something I would have to implement manually? Cacheing seems like it would work, but does that cache persist across runs, or if different `Flow`s are using the same

Task

? As a concrete example - I’m writing a single row per day into a psql database. The day is defined by a

Parameter("date")

. If I run my Flow repeatedly, it keeps inserting rows for the same date. Instead, I’d like to check for existence of the row with that date and return with the appropriate State (probably

Success

?).

Chris White

12/29/2019, 11:49 PM

Hey @Adam! Yea, Prefect caching will satisfy the same pattern that you describe; in Prefect Core, currently the cache is memory-based (and so won’t necessarily persist across runs unless you extract it and save it yourself), but in Prefect Cloud it naturally will. For sharing a cache across tasks / flows, check out the

cache_key

keyword argument to tasks (more on this here: https://docs.prefect.io/core/concepts/persistence.html#persistence-and-caching) For your example, you would use the

all_parameters

partial_parameters_only

cache validator on whatever task performs the write. Let me know if you have any follow-up questions or need any clarifications!

Adam

12/29/2019, 11:55 PM

Great, that makes sense. One other follow-up: I assume that also means that if my output was to disappear (whether it be a row in a database or s3/gcs bucket or something else) - Prefect Cloud wouldn’t know that because I’m guessing it just keeps track of the metadata that this combination of parameters resulted in a successful run. Is that right? If so, is there a way to force it to do a backfill?

Chris White

12/29/2019, 11:58 PM

Yea, so in that case I recommend either: - customizing one of Prefect’s cache validators to perform that check at runtime (the cache validator that you choose fully controls whether the cache is respected or not). Right now all of the off-the-shelf validators are based entirely on input / parameter checks - creating a Task which performs the check and returns a distinct value depending on whether the “thing” exists or not, and then use a cache validator which checks the value of this input from run to run

Adam

12/30/2019, 12:46 AM

This is great. Thanks! Now go enjoy your Sunday evening :)

🏖️ 1

💯 1

Open in Slack

Previous Next