l

    Leo Meyerovich (Graphistry)

    1 year ago
    Is there a way to get an IntervalSchedule to pass (a Parameter?) the interval to the resulting flow handler so it knows what time span it is responsible for?
    Chris White

    Chris White

    1 year ago
    You can specify parameters on the individual clocks that compose a schedule, would that work for your use case? https://docs.prefect.io/core/concepts/schedules.html#varying-parameter-values
    l

    Leo Meyerovich (Graphistry)

    1 year ago
    afaict no, that let's me know which clock it came from (e.g., "daily"), but not which clock tick ("january 7th")
    maybe i'm thinking about this wrong. we want to do livefeed + backfill multiprocessing w/ some level of concurrency control (so not 100% fanout when doing months of backfill, but some)
    each span is of ~30s, w/ backfill length of say 1d - 1yr, and we have anywhere from 4 - 40 agents to handle them. live feed is actually multiple live feeds (think equiv of kafka topics or say automated TV watching)
    live feed feels like the scheduler would work -- if we could figure out param passing of the current interval / tick # / ... -- and something similar would probably work for backfill. this feels like a normal pattern, so curious how others do it..
    Chris White

    Chris White

    1 year ago
    I think I’m missing some nuance / subtlety in your description; why wouldn’t knowing the
    scheduled_start_time
    + what clock generated it be sufficient? That is the specific tick of the clock, and along with a parameter you could know both what clock and what tick it came from.
    l

    Leo Meyerovich (Graphistry)

    1 year ago
    Hi @Chris White maybe we're reading different docs -- they do not appear to show a way for a tic's task to see what scheduled time span it is associated with, just the global clock name
    e.g., for
    clock1   = clocks.IntervalClock(start_date=now, 
                                    interval=datetime.timedelta(minutes=1), 
                                    parameter_defaults={"p": "CLOCK 1"})
    while the docs show how to read
    CLOCK 1
    , the task needs to see
    span 10:00 - 10:01
    Chris White

    Chris White

    1 year ago
    The main reference for what i’m describing is the doc I linked to above; clocks don’t really have first class names in Prefect, so I’m not sure what you’re referring to. To be precise, here’s what I’m recommending:
    clock1   = clocks.IntervalClock(start_date=now, 
                                    interval=datetime.timedelta(minutes=1), 
                                    parameter_defaults={"date-range": "json-representation-of-time-span"})
    ...
    etc. with however many clocks you need, configured for what you’re actually trying to accomplish
    additionally, tasks can reference
    scheduled_start_time
    from Prefect context which adds another layer of information
    l

    Leo Meyerovich (Graphistry)

    1 year ago
    ah, and that's guaranteed to line up to the clock's tic, e.g., no drift?
    also on the back of my mind here is it feels like I'm thinking about it wrong. e.g., if running a few backfill attempts, should have a canonically parameterized task somehow so prefect can try to skip already-filled tasks. or maybe that's too complicated so should be in our code..
    Chris White

    Chris White

    1 year ago
    yup yup, scheduled start time is the exact moment the flow was scheduled for regardless of when it ends up running. Also, two cents -- if you need more nuanced control of these time spans you might be better off with a Cron Clock?
    and re: backfills, I think you are correct, I wouldn’t recommend scheduling back fills through the scheduler but rather via ad-hoc runs with some canonical time information (whether that comes from context or a parameter is personal preference)
    l

    Leo Meyerovich (Graphistry)

    1 year ago
    part of the reason i was thinking intervals for backfill was to avoid flooding prefect, this way it guiarantees task generation at a limited rate. maybe there's a better way? (think 30s chunks to cover 1yr, and only running say 20 concurrently)
    Chris White

    Chris White

    1 year ago
    If you are using Cloud, you could use Flow label concurrency limiting to submit them all at once but only release 20 at a time; otherwise you could implement the batches client side, and only create 20 at a time until they complete (this would be a little annoying I think, but is doable)
    l

    Leo Meyerovich (Graphistry)

    1 year ago
    thanks, this helps
    my takeaway for backfill is roughly: -- client manually orchestrates task generation for backfill -- to get caching/naming/ui/etc. benefits, can do some sort of two-level flow where a outer-level clock tics at flow-control rate, and it spawns more canonically named/parameterized/tracked spans. we make sure the inner flow tasks are those that are consistent between backfill attempts + future flows. prefect can now have nice ui + maybe skip previously-succeeded inner tasks when we retry backfills. -- as we add/remove agents, can just tweak the outer scheduler to spawn more