https://prefect.io logo
s

Sander

05/19/2022, 10:00 AM
Hi, I'm able to set up a flow and orchestrate it with a deployment spec. Now I'd like to know what the best solution is for running a backfill set of runs. Could you give some pointers?
1
a

Anna Geller

05/19/2022, 10:34 AM
For backfilling I would trigger such flow runs locally by specifying the date periods using flow parameters:
Copy code
import pendulum
from prefect import task, flow, get_run_logger


@task
def extract_and_load_data(start_date: str, end_date: str):
    # your extract logic based on those dates
    logger = get_run_logger()
    <http://logger.info|logger.info>(f"Backloading data for the interval {start_date} - {end_date}")


@flow
def ingest_and_backfill(
    start_date=pendulum.yesterday(tz="America/New_York").isoformat(),
    end_date=pendulum.today(tz="America/New_York").isoformat(),
):
    extract_and_load_data(start_date=start_date, end_date=end_date)


if __name__ == "__main__":
    ingest_and_backfill(your_args_here_for_backfill)
👀 1
s

Sander

05/19/2022, 10:35 AM
Is that considered best practice as I understand other solution have this functionality via a gui?
a

Anna Geller

05/19/2022, 10:37 AM
Creating a parametrized run through UI will be supported, it's on the roadmap
but backfilling through script locally seems easier, especially given that even local runs are tracked in the UI in Prefect 2.0
s

Sander

05/19/2022, 10:43 AM
Reason being that in our case we don't like manual activities on our data pipeline boxes and most processes need to be abstracted away to avoid these manual activities. But I also prefer the manual solution as it'll be one off anyways.
a

Anna Geller

05/19/2022, 10:48 AM
Any backfill is a manually triggered run, what am I missing? not sure I understand the difference In 1.0 the difference was that a run triggered from a local script wouldn't be reflected in the UI and wouldn't be auditable, but this is no longer the case with 2.0 where the API is omnipresent even with local runs - all runs are auditable and reflected in the run history in the UI, regardless which process triggered it
🚀 1
a

Amanda Wee

05/19/2022, 11:25 AM
Triggering flow runs locally (or on the server, if feasible) with date ranges is how my team does it, but we're using pre-1.0 so yes the lack of a log of the run in the UI isn't great.
👍 1
s

Sander

05/19/2022, 12:12 PM
The difference is that you need to be able to run scripts (that could be unaudited scripts as well and this is a concern) on a box manually vs the run is triggered by a system with settings applied.
a

Anna Geller

05/19/2022, 12:23 PM
I'm curious, what would you consider an unaudited script? I believe if someone is able to talk to your API, it's better be that those people/processes are the ones who are authorized to do so - e.g. only users and processes with a valid API_KEY can talk to the Cloud 2.0 API so the default state of Prefect 2.0 is that only audited people and processes could run your backfill scripts
s

Sander

05/19/2022, 1:24 PM
For most of our prod scripts we require some form of review and then it's fine. It's not really about the people (who may or may not have rights) but more about having some form of collab that results in better scripts. Generally I believe that most infra internally is open to most people.
a

Anna Geller

05/19/2022, 1:44 PM
exactly, I think this is more a people problem than technology problem, well put
🙂 1
5 Views