Alexander Seifert

    Alexander Seifert

    1 year ago
    hello! beginner question: in DVC’s pipeline feature, a stage will only be re-executed if something in the stage definition or its dependencies changes (see https://dvc.org/doc/user-guide/project-structure/pipelines-files#stages). this is not default behaviour in prefect, right? how do i do this with prefect?
    Kevin Kho

    Kevin Kho

    1 year ago
    Hey @Alexander Seifert,retries are one on a task level in Prefect. Is this what you are looking for? https://docs.prefect.io/core/concepts/tasks.html#retries
    If upstream dependencies fail in Prefect, the task won’t run by default
    Alexander Seifert

    Alexander Seifert

    1 year ago
    sorry, before editing the initial post was missing an important word: if something in the state definition or its dependencies changes.
    so no, i’m not looking for retries, more along the lines of caching i guess. if i call the exact same pipeline twice, and the definition + dependencies stayed the same, then nothing should have to be recalculated.
    Kevin Kho

    Kevin Kho

    1 year ago
    Ah ok so for caching, Prefect has two mechanisms. The first is file-based and is called
    target
    . If a file exists at the target, the task won’t run. The second is caching and comprises of
    cache_for
    and
    cache_validator
    so you specify the duration that the cache is valid, and then the cache validator will determine whether something has to be re-run (based in inputs, or parameters, etc.)
    Alexander Seifert

    Alexander Seifert

    1 year ago
    alright, thanks! so e.g. for a data preprocessing flow i would maybe have a file-based target that encodes the md5 hash of my raw data, so when the data changes the md5 changes and then there’s no target so the flow runs again
    Kevin Kho

    Kevin Kho

    1 year ago
    not a bad idea, seems like that might be used in conjunction with our KV Store to keep track of the hash
    Alexander Seifert

    Alexander Seifert

    1 year ago
    or a cache_validator that calculates those md5 sums and checks against previously encountered values
    Kevin Kho

    Kevin Kho

    1 year ago
    It just needs to fit under 10KB for the KV Store but you can just persist and update that
    Alexander Seifert

    Alexander Seifert

    1 year ago
    alright, thanks. but it seems that what i want to do is something that needs to be pieced together manually rather than just working out of the box. just wanted to check that i’m not missing something!
    Kevin Kho

    Kevin Kho

    1 year ago
    Yes we don’t have listeners. Instead, event-based flows are normally triggered by hitting our API from the event.
    Alexander Seifert

    Alexander Seifert

    1 year ago
    alright, thanks!
    Brad

    Brad

    1 year ago
    @Kevin Kho @Alexander Seifert I’d be interested in a feature like this and willing to contribute. @Alexander Seifert this could be implemented as a dask graph optimisation (I’ve done this in the past).
    Kevin Kho

    Kevin Kho

    1 year ago
    Hey Brad, I’m glad you’re interesting in contributing. Could you detail your thoughts in a Github issue so the core team can see it and discuss? I’d like to learn more about the dask graph optimization
    Brad

    Brad

    1 year ago
    Sure thing - I’ll try and whip up a motivating example
    Actually - on thinking about this a little more, ignore my suggestion of the dask graph optimise - that’s too executor specific. I think this could potentially be accomplished via a FlowRunner subclass. I’m going to have a play around and see if I can make something work
    Kevin Kho

    Kevin Kho

    1 year ago
    Ah that’s true
    Brad

    Brad

    1 year ago
    Hey @Kevin Kho @Alexander Seifert I opened https://github.com/PrefectHQ/prefect/discussions/4935 to discuss
    And a potential implementation (very WIP) here https://github.com/limx0/caching_flow_runner
    Kevin Kho

    Kevin Kho

    1 year ago
    Can you create a new message on Slack cuz I wanna tag 3 community members interested in this?
    Brad

    Brad

    1 year ago
    ya