b

    Bernhard

    2 years ago
    Hi all, I am a new user: I want to handle the use case of watching a directory for changes in zipfile metadata. For changed zipfiles several tasks would be run then. The resulting flow would be run every 24 hours. The coarse concept of a flow could be: 1)   Initialize the prefect-cache with zipfile metadata (when the flow is started) 2)   At midnight get up to date zipfile metadata and compare with cached metadata 3)   Refresh cache for changed zipfile metadata 4)   for changed zipfile metadata only download the zipfiles and compute various derivatives 5)   wait for next midnight   The part "download the zipfile and compute various derivatives" is working nicely already. I would like to obtain recommendations referring designing a cache validator, and how to initialize the complete cache, after the flow is started.   Thank You
    Zachary Hughes

    Zachary Hughes

    2 years ago
    Hi Bernhard, good question! What's the scale of the data you're working with here?
    The scale will likely inform the specifics of how this gets implemented, but the fact that you've already decomposed your flow into steps is solid-- each of those steps looks like an intuitive candidate for a Prefect task.
    b

    Bernhard

    2 years ago
    Hi Zachary, thank you for your reply. Currently there is fixed set of 77 zipfiles with approximately 10 to 20 instances having an update per day. Within my existing flow for several tasks I use their map() methodes to parallelize task execution. So creation of various derivates & data validation tasks is achived within few hours.
    Zachary Hughes

    Zachary Hughes

    2 years ago
    Okay, great! Mapping was going to be one of my recommendations. Between that and making use of schedule to ensure your flow runs at midnight, I think you're set up for success. Are there specific things you had questions about?