< Marvin> What would you recommend as the best way to implem Prefect Community #ask-marvin

Join Slack

<@ULVA73B9P> What would you recommend as the best ...

# ask-marvin

Ed Hartley

09/08/2025, 10:20 AM

@Marvin What would you recommend as the best way to implement backfilling into our deployments?

Marvin

09/08/2025, 10:20 AM

✅ thought for 472.7 seconds

Marvin

09/08/2025, 10:28 AM

Great question. In Prefect 3.x, the most reliable way to “backfill” a deployment is to explicitly create the historical flow runs you need, while controlling concurrency so you don’t overwhelm your infra. Here are the recommended patterns (pick what fits your workflow and volume): 1) Programmatic backfill driver (recommended for most cases) - Use the Python API to submit one run per partition (e.g., per day/hour) with parameters, tags, and idempotency keys. - Gate concurrency with an asyncio semaphore or via Prefect concurrency features (see links below). - This gives you full control and is easy to repeat/monitor. Example:

Copy code

import asyncio
import pendulum
from prefect.deployments.flow_runs import run_deployment

DEPLOYMENT = "my-flow/prod"  # <FLOW_NAME>/<DEPLOYMENT_NAME>
START = pendulum.datetime(2024, 1, 1, tz="UTC")
END = pendulum.datetime(2024, 1, 31, tz="UTC")
MAX_PARALLEL = 5

async def submit_one(ds):
    return await run_deployment(
        name=DEPLOYMENT,
        parameters={"ds": ds.to_date_string()},  # adjust to your flow’s param(s)
        flow_run_name=f"backfill-{ds.to_date_string()}",
        scheduled_time=pendulum.now("UTC"),      # run ASAP
        tags=["backfill"],
        idempotency_key=f"backfill:{DEPLOYMENT}:{ds.to_date_string()}",
    )

async def main():
    sem = asyncio.Semaphore(MAX_PARALLEL)
    async def with_sem(ds):
        async with sem:
            return await submit_one(ds)

    days = (END - START).days + 1
    dates = [START.add(days=i) for i in range(days)]
    await asyncio.gather(*(with_sem(d) for d in dates))

if __name__ == "__main__":
    asyncio.run(main())

2) CLI for small backfills or ad-hoc runs - Creates a single run. Loop in your shell for short ranges. - You can schedule start times, pass parameters, and watch. Examples:

Copy code

# Single run now with parameters
prefect deployment run my-flow/prod -p ds=2024-01-01

# Single run that starts at a specific time
prefect deployment run my-flow/prod -p ds=2024-01-01 --start-at "2024-01-01T00:05:00Z"

# Simple loop (bash) for a handful of days
for d in 2024-01-01 2024-01-02 2024-01-03; do
  prefect deployment run my-flow/prod -p ds=$d
done

We verified the CLI structure: - prefect deployment run: create a run - prefect deployment schedule create: manage schedules - prefect concurrency-limit ... and prefect work-pool ...: concurrency controls 3) Temporary schedule for a controlled set of timestamps - If you want Prefect to generate the occurrences for you, you can create a temporary schedule (e.g., an RRule for a specific date range), let the scheduler create runs, then delete the schedule afterward. - This is neat for precise, enumerated points-in-time, but programmatic submission (Pattern 1) tends to be more explicit and repeatable for backfills. Example:

Copy code

prefect deployment schedule create my-flow/prod \
  --rrule "FREQ=DAILY;DTSTART=20240101T000000Z;UNTIL=20240131T000000Z"

Note: Schedules focus on upcoming occurrences. For past-heavy backfills with lots of points, the programmatic approach is typically simpler and more predictable. Controlling concurrency and load - Work pool concurrency: throttle total concurrent flow runs executed by a pool

Copy code

prefect work-pool set-concurrency-limit <your-pool> 5

- Task-level concurrency limits: apply a tag to heavy tasks (e.g., db writes) and set a limit so backfills don’t overwhelm systems

Copy code

prefect concurrency-limit create warehouse_writes 3

Then tag the relevant tasks in your flow with

tags=["warehouse_writes"]

. - You can also submit runs with a “backfill” tag (as in the Python example) to filter/monitor in the UI. Operational tips - Use idempotency keys to avoid duplicate submissions for the same partition. - Name runs with the partition (e.g., backfill-YYYY-MM-DD) for traceability. - Consider pausing the “regular” schedule during big backfills if you want to isolate the load.

Marvin

09/08/2025, 10:28 AM

- For very large ranges, submit in batches to avoid overloading your API or infra. Docs and references - Schedules overview: Prefect 3 Schedules - Concurrency limits (CLI): CLI: concurrency-limit - Global concurrency/how-to: Global concurrency limits - Python API used above:

prefect.deployments.flow_runs.run_deployment

(we verified availability and signature) If you can share a bit more about your backfill shape (e.g., date-partitioned daily/hourly? expected number of runs? deployment/work pool type?), I can tailor a concrete plan and code snippet for your environment.

5 Views

Open in Slack

Previous Next