d

    Dan Ball

    2 years ago
    Hi all! I’m experimenting with porting an old Luigi dag to Prefect and wondering what the most idiomatic approach would be. (I asked a similar question on the Airflow Slack because I’m trying to wrap my head around the differences in approach.) My current process: • Parses the HTML of a webpage every hour and fetches a timestamp value indicating the last time a linked file was updated • If the ETL DAG has not been run for that timestamp, it downloads the file, does some processing, uploads to S3, etc. • If it has been run successfully for that timestamp, does nothing and checks again in another hour. Do I want to create a DAG that ends with a task that creates a Result target on success, and begin the DAG with a task that checks for the existence of that target? Or is there a more Prefecty approach to checking whether the DAG has been successfully run with a given parameter (in this case, a fetched external timestamp) and skip it if so?
    Dylan

    Dylan

    2 years ago
    Hi @Dan Ball! Prefect has a
    SKIP
    signal that might be useful for you in this instance. Since you want the flow to run every hour, the flow can parse the HTML and then reference the configuration timestamp. If the timestamps match, the Task can raise a
    SKIP
    signal, and downstream tasks can set
    skip_on_upstream_skip=True
    to skip as well. The Flow Run will be considered successful. If the timestamps don’t match, the flow can proceed as normal. Here’s some documentation on the
    SKIP
    signal:https://docs.prefect.io/api/latest/engine/signals.html#skip https://docs.prefect.io/core/concepts/states.html#skip-while-running https://docs.prefect.io/api/latest/core/task.html#task-2
    d

    Dan Ball

    2 years ago
    Great, thanks @Dylan – I’m going to give this a go
    Dylan

    Dylan

    2 years ago
    Of course! Let me know if I can be of assistance 😃