Found the Prefect tutorial by <@U018SPFQC8Y> to be...
# prefect-community
c
Found the Prefect tutorial by @Laura Lorenz to be highly informative - looking to deepen understanding and application of core concepts as per Prefect docs. Any chance we get more of such tutorials in the near future? Thanks for this great product!
❤️ 1
l
@CA Lee thank you! Do you mean the video tutorial? If so, yes, some more are coming out starting mid October. I’m collaborating with @Kyle Moon-Wright for some too, you may have seen him around the slack answering questions :) I’ll ping you when the next one is up!
c
@Laura Lorenz Yes, I was referring to the video Getting Started with Prefect (PyData Denver). That video really helped me to get started understanding how all the pieces fit together. Really liked your presentation style! It also helped that its a very relatable workflow (obtaining some data, cleaning it then storing it), and I appreciated the toggling between the code base signatures with explanation and also the overall architecture overview. Looking forward to the next one! It would be great if it was something similar in presentation style, but covering the core concepts as per Prefect docs. A generic use case really helps to nail those in 😀
l
You are going to absolutely love what @Kyle Moon-Wright and I are putting together then 🙂 It's exactly an ETL use case evolving from simple to adding on more and more core concepts, typed out in real time with code base/docs overlays! I'm excited to show you 🤗
🤩 1
c
Also, @Kyle Moon-Wright is helping me out on this one, but this follow-up question was actually from watching the video I mentioned:
Copy code
@task(cache_for=datetime.timedelta(days=1))
def get_complaint_data():
    do something

raw = get_complaint_data()
parsed = parse_complaint_data(raw)
populated_table = store_complaints(parsed)
Question being that lets say some fetching of data (e.g. a web scraping script) is run on an hourly interval. Caching would help prevent the fetching from running again, but how would I then stop the parsing and populating, based on the cached state of the fetching data step? ( as it wouldn't make sense to clean or store the same cached data again )
l
If you still want other downstream tasks of your flow to run, IMHO you can use
cache_keys
to mark that all of those tasks share the same cache, and thus can consider themselves cached as long as that cache key is not invalidated. See https://github.com/PrefectHQ/prefect/blob/master/src/prefect/core/task.py#L156 and the last bullet in https://docs.prefect.io/core/concepts/persistence.html#output-caching (I know the api docs says deprecated there, but I'm pretty sure it's not actually deprecated yet until https://github.com/PrefectHQ/prefect/issues/2619 is done, in which case you would move that configuration onto the result). You could also use a custom trigger (https://docs.prefect.io/api/latest/triggers.html#triggers) since all triggers get their upstream dependency's edges and states (https://github.com/PrefectHQ/prefect/blob/b9914890dfec52610a42cd694427badafab8c8ba/src/prefect/triggers.py#L174) but depending how many other dependencies those tasks have it could get quite tricky, and afaik we don't have a published example that operates on specific upstream tasks to decide a trigger -- it should be possible, we just don't have any examples so you'd have to reverse engineer it a bit 🙂
🤗 1
upvote 1
c
Got it - will be spending some time working through those. Thanks for pointing me in the right direction!