hello beginner question in DVC s pipeline feature a stage wi Prefect Community #ask-community

hello! beginner question: in DVC’s pipeline featur...

Alexander Seifert

08/18/2021, 8:43 AM

hello! beginner question: in DVC’s pipeline feature, a stage will only be re-executed if something in the stage definition or its dependencies changes (see https://dvc.org/doc/user-guide/project-structure/pipelines-files#stages). this is not default behaviour in prefect, right? how do i do this with prefect?

Kevin Kho

08/18/2021, 1:29 PM

Hey @Alexander Seifert,retries are one on a task level in Prefect. Is this what you are looking for? https://docs.prefect.io/core/concepts/tasks.html#retries

Kevin Kho

08/18/2021, 1:30 PM

If upstream dependencies fail in Prefect, the task won’t run by default

Alexander Seifert

08/18/2021, 2:35 PM

sorry, before editing the initial post was missing an important word: if something in the state definition or its dependencies changes.

Alexander Seifert

08/18/2021, 2:37 PM

so no, i’m not looking for retries, more along the lines of caching i guess. if i call the exact same pipeline twice, and the definition + dependencies stayed the same, then nothing should have to be recalculated.

Kevin Kho

08/18/2021, 2:44 PM

Ah ok so for caching, Prefect has two mechanisms. The first is file-based and is called

target

. If a file exists at the target, the task won’t run. The second is caching and comprises of

cache_for

and

cache_validator

so you specify the duration that the cache is valid, and then the cache validator will determine whether something has to be re-run (based in inputs, or parameters, etc.)

Alexander Seifert

08/18/2021, 3:12 PM

alright, thanks! so e.g. for a data preprocessing flow i would maybe have a file-based target that encodes the md5 hash of my raw data, so when the data changes the md5 changes and then there’s no target so the flow runs again

Kevin Kho

08/18/2021, 3:13 PM

not a bad idea, seems like that might be used in conjunction with our KV Store to keep track of the hash

Alexander Seifert

08/18/2021, 3:13 PM

or a cache_validator that calculates those md5 sums and checks against previously encountered values

Kevin Kho

08/18/2021, 3:14 PM

It just needs to fit under 10KB for the KV Store but you can just persist and update that

Alexander Seifert

08/18/2021, 3:16 PM

alright, thanks. but it seems that what i want to do is something that needs to be pieced together manually rather than just working out of the box. just wanted to check that i’m not missing something!

Kevin Kho

08/18/2021, 3:18 PM

Yes we don’t have listeners. Instead, event-based flows are normally triggered by hitting our API from the event.

Alexander Seifert

08/18/2021, 3:24 PM

alright, thanks!

Brad

09/01/2021, 2:51 AM

@Kevin Kho @Alexander Seifert I’d be interested in a feature like this and willing to contribute. @Alexander Seifert this could be implemented as a dask graph optimisation (I’ve done this in the past).

Kevin Kho

09/01/2021, 2:56 AM

Hey Brad, I’m glad you’re interesting in contributing. Could you detail your thoughts in a Github issue so the core team can see it and discuss? I’d like to learn more about the dask graph optimization

Brad

09/01/2021, 2:56 AM

Sure thing - I’ll try and whip up a motivating example

Brad

09/01/2021, 10:30 PM

Actually - on thinking about this a little more, ignore my suggestion of the dask graph optimise - that’s too executor specific. I think this could potentially be accomplished via a FlowRunner subclass. I’m going to have a play around and see if I can make something work

Kevin Kho

09/01/2021, 10:31 PM

Ah that’s true

Brad

09/03/2021, 1:41 AM

Hey @Kevin Kho @Alexander Seifert I opened https://github.com/PrefectHQ/prefect/discussions/4935 to discuss

Brad

09/03/2021, 1:41 AM

And a potential implementation (very WIP) here https://github.com/limx0/caching_flow_runner

Kevin Kho

09/03/2021, 1:44 AM

Can you create a new message on Slack cuz I wanna tag 3 community members interested in this?

Brad

09/03/2021, 1:44 AM

5 Views

Open in Slack

Previous Next