Hi all! I’m experimenting with porting an old Luigi dag to Prefect and wondering what the most idiomatic approach would be. (I asked a similar question on the Airflow Slack because I’m trying to wrap my head around the differences in approach.) My current process:• Parses the HTML of a webpage every hour and fetches a timestamp value indicating the last time a linked file was updated
• If the ETL DAG has not been run for that timestamp, it downloads the file, does some processing, uploads to S3, etc.
• If it has been run successfully for that timestamp, does nothing and checks again in another hour.
Do I want to create a DAG that ends with a task that creates a Result target on success, and begin the DAG with a task that checks for the existence of that target? Or is there a more Prefecty approach to checking whether the DAG has been successfully run with a given parameter (in this case, a fetched external timestamp) and skip it if so?
signal that might be useful for you in this instance. Since you want the flow to run every hour, the flow can parse the HTML and then reference the configuration timestamp. If the timestamps match, the Task can raise a
signal, and downstream tasks can set
to skip as well. The Flow Run will be considered successful.If the timestamps don’t match, the flow can proceed as normal. Here’s some documentation on the