Ed Hartley
09/08/2025, 10:20 AMMarvin
09/08/2025, 10:20 AMMarvin
09/08/2025, 10:28 AMimport asyncio
import pendulum
from prefect.deployments.flow_runs import run_deployment
DEPLOYMENT = "my-flow/prod" # <FLOW_NAME>/<DEPLOYMENT_NAME>
START = pendulum.datetime(2024, 1, 1, tz="UTC")
END = pendulum.datetime(2024, 1, 31, tz="UTC")
MAX_PARALLEL = 5
async def submit_one(ds):
return await run_deployment(
name=DEPLOYMENT,
parameters={"ds": ds.to_date_string()}, # adjust to your flow’s param(s)
flow_run_name=f"backfill-{ds.to_date_string()}",
scheduled_time=pendulum.now("UTC"), # run ASAP
tags=["backfill"],
idempotency_key=f"backfill:{DEPLOYMENT}:{ds.to_date_string()}",
)
async def main():
sem = asyncio.Semaphore(MAX_PARALLEL)
async def with_sem(ds):
async with sem:
return await submit_one(ds)
days = (END - START).days + 1
dates = [START.add(days=i) for i in range(days)]
await asyncio.gather(*(with_sem(d) for d in dates))
if __name__ == "__main__":
asyncio.run(main())
2) CLI for small backfills or ad-hoc runs
- Creates a single run. Loop in your shell for short ranges.
- You can schedule start times, pass parameters, and watch.
Examples:
# Single run now with parameters
prefect deployment run my-flow/prod -p ds=2024-01-01
# Single run that starts at a specific time
prefect deployment run my-flow/prod -p ds=2024-01-01 --start-at "2024-01-01T00:05:00Z"
# Simple loop (bash) for a handful of days
for d in 2024-01-01 2024-01-02 2024-01-03; do
prefect deployment run my-flow/prod -p ds=$d
done
We verified the CLI structure:
- prefect deployment run: create a run
- prefect deployment schedule create: manage schedules
- prefect concurrency-limit ... and prefect work-pool ...: concurrency controls
3) Temporary schedule for a controlled set of timestamps
- If you want Prefect to generate the occurrences for you, you can create a temporary schedule (e.g., an RRule for a specific date range), let the scheduler create runs, then delete the schedule afterward.
- This is neat for precise, enumerated points-in-time, but programmatic submission (Pattern 1) tends to be more explicit and repeatable for backfills.
Example:
prefect deployment schedule create my-flow/prod \
--rrule "FREQ=DAILY;DTSTART=20240101T000000Z;UNTIL=20240131T000000Z"
Note: Schedules focus on upcoming occurrences. For past-heavy backfills with lots of points, the programmatic approach is typically simpler and more predictable.
Controlling concurrency and load
- Work pool concurrency: throttle total concurrent flow runs executed by a pool
prefect work-pool set-concurrency-limit <your-pool> 5
- Task-level concurrency limits: apply a tag to heavy tasks (e.g., db writes) and set a limit so backfills don’t overwhelm systems
prefect concurrency-limit create warehouse_writes 3
Then tag the relevant tasks in your flow with tags=["warehouse_writes"]
.
- You can also submit runs with a “backfill” tag (as in the Python example) to filter/monitor in the UI.
Operational tips
- Use idempotency keys to avoid duplicate submissions for the same partition.
- Name runs with the partition (e.g., backfill-YYYY-MM-DD) for traceability.
- Consider pausing the “regular” schedule during big backfills if you want to isolate the load.Marvin
09/08/2025, 10:28 AMprefect.deployments.flow_runs.run_deployment
(we verified availability and signature)
If you can share a bit more about your backfill shape (e.g., date-partitioned daily/hourly? expected number of runs? deployment/work pool type?), I can tailor a concrete plan and code snippet for your environment.