Hi all! I’m experimenting with porting an old Luig...
# prefect-community
d
Hi all! I’m experimenting with porting an old Luigi dag to Prefect and wondering what the most idiomatic approach would be. (I asked a similar question on the Airflow Slack because I’m trying to wrap my head around the differences in approach.) My current process: • Parses the HTML of a webpage every hour and fetches a timestamp value indicating the last time a linked file was updated • If the ETL DAG has not been run for that timestamp, it downloads the file, does some processing, uploads to S3, etc. • If it has been run successfully for that timestamp, does nothing and checks again in another hour. Do I want to create a DAG that ends with a task that creates a Result target on success, and begin the DAG with a task that checks for the existence of that target? Or is there a more Prefecty approach to checking whether the DAG has been successfully run with a given parameter (in this case, a fetched external timestamp) and skip it if so?
👀 1
d
Hi @Dan Ball! Prefect has a
SKIP
signal that might be useful for you in this instance. Since you want the flow to run every hour, the flow can parse the HTML and then reference the configuration timestamp. If the timestamps match, the Task can raise a
SKIP
signal, and downstream tasks can set
skip_on_upstream_skip=True
to skip as well. The Flow Run will be considered successful. If the timestamps don’t match, the flow can proceed as normal. Here’s some documentation on the
SKIP
signal: https://docs.prefect.io/api/latest/engine/signals.html#skip https://docs.prefect.io/core/concepts/states.html#skip-while-running https://docs.prefect.io/api/latest/core/task.html#task-2
d
Great, thanks @Dylan – I’m going to give this a go
d
Of course! Let me know if I can be of assistance :)