when i deploy a flow, if my worker is suddenly tur...
# ask-community
a
when i deploy a flow, if my worker is suddenly turn on, all the scheduled run are run. How do I make the schedule not catch up but only run the next one scheduled? here's my deployment file:
Copy code
# Welcome to your prefect.yaml file! You can use this file for storing and managing
# configuration for deploying your flows. We recommend committing this file to source
# control along with your flow code.

# Generic metadata about this project
name: fintech_api
prefect-version: 3.0.4

# build section allows you to manage and build docker images
build:

# push section allows you to manage if and how this project is uploaded to remote locations
push:

# pull section allows you to provide instructions for cloning this project in remote locations
pull:
- prefect.deployments.steps.set_working_directory:
    directory: /home/anvutrong/trong_an_personal/Promete/fintech_api

# the deployments section allows you to provide configuration for deploying flows
deployments:
- name: daily_update
  tags:
  - staging pipeline
  description: Fetch and store data from Wifeed API to MongoDB
  entrypoint: src/api/flow_deployment.py:daily_update
  parameters: {}
  work_pool:
    name: local-wp
    work_queue_name: primary-queue
    job_variables: {}
  version:
  concurrency_limit: 1
  collision_strategy: CANCEL_NEW
  enforce_parameter_schema: true
  schedules:
  - rrule: RRULE:FREQ=DAILY;INTERVAL=1;BYDAY=MO,TU,WE,TH,FR;BYHOUR=0;BYMINUTE=0;BYSECOND=0
    timezone: Asia/Bangkok
    active: true
    max_active_runs:
    catchup: false

- name: intraday_eod_auto
  tags:
  - stock price pipeline
  description: Automatically fetch intraday stock price into clickhouse DB
  entrypoint: src/api/flow_deployment.py:intraday_eod
  parameters: {
    check_first_time_migration: false
  }
  work_pool:
    name: local-wp
    work_queue_name: primary-queue
    job_variables: {}
  concurrency_limit: 1
  collision_strategy: CANCEL_NEW
  enforce_parameter_schema: true
  schedules:
  - rrule: RRULE:FREQ=MINUTELY;INTERVAL=1;BYDAY=MO,TU,WE,TH,FR;BYHOUR=9,10,11,13,14,15
    timezone: Asia/Bangkok
    active: true
    max_active_runs: 1
    catchup: false
  version:
a
Hi! 👋 I see you have a couple of fields in your schedules that we don’t actually support — max_active_runs and catchup. We were working on these in a prototype that we didn’t continue with, and they’ve now been removed. I see you’re also using our deployment concurrency feature, which is the best way to control the kind of concurrency you described. There are two “collision strategies”: if the deployment is running the max number of runs and more runs are scheduled to run at the same time, either enqueue the extras until some of the current runs finish (the “enqueue” strategy) or cancel the extras until runs immediately. These are the two concurrency modes we support for deployments now, and “cancel” is probably the closest to what I hear you describing. You currently appear to be set to cancel new runs. Are you seeing runs get canceled correctly if they’re scheduled to run but the deployment is already running its max concurrent runs?
I see one thing for you to double-check. If you're setting up deployment concurrency, you can use a CLI command like this:
Copy code
prefect deploy ... --concurrency-limit 3 --collision-strategy CANCEL_NEW
If you're editing your YAML file directly and want to set a collision strategy, you do that a little differently than how your YAML is configured. Check this out:
Copy code
concurrency_limit:
  limit: 3
  collision_strategy: CANCEL_NEW
a
thanks you, ive been looking for how to implement it correctly in the yaml file
a
Ok, great! We can make the YAML setup clearer in our docs.
a
@Andrew Brookins Hi Andrew, hope you doing well. May I ask you something else about this. I supposed catchup feature has been dropped but I need something similar. I'm using your CANCEL strategy as you said, but if my worker just turn online, it will execute the latest late flow immediately. But my intention is that I need it to run on the next schedule flow, not the latest late flow. For example: My daily update flow is intended to run at 0 AM, it missed so I need to run it at 0AM the next day, but when I'm start a worker at 9PM, it run the late flow at that 9PM instead of 0AM the next day. How can I achieve that schedule? Thanks you
a
Gotcha - I think I follow. Without knowing for sure the best place for the setting, I imagine that you'd want to set
catchup=False
and have the worker ignore late runs and only run the next non-late run, or something like that?
To me, this isn't directly related to concurrency but is certainly adjacent, so either a schedule or deployment-level setting.
Like
deployment.catch_up_late_runs = False
or
deployment.schedules[0].catch_up_late_runs=False
. I think we'd probably start at the deployment level and then consider afterward if we also wanted/needed to let individual schedules override.
a
Hi Andrew,
Copy code
- name: daily_update
  tags:
  - staging pipeline
  description: Fetch and store data from Wifeed API to MongoDB
  entrypoint: src/api/flow_controller.py:daily_update
  parameters: {}
  work_pool:
    name: local-wp
    work_queue_name: primary-queue
    job_variables: {}
  version:
  concurrency_limit:
    limit: 1
    collision_strategy: CANCEL_NEW
  enforce_parameter_schema: true
  catch_up_late_runs: false
  schedules:
  - rrule: RRULE:FREQ=DAILY;INTERVAL=1;BYDAY=MO,TU,WE,TH,FR;BYHOUR=0;BYMINUTE=0;BYSECOND=0
    timezone: Asia/Bangkok
    active: true
    max_active_runs:
    catchup: false
    catch_up_late_runs: false
@Andrew Brookins I set this up with catch_up_late_runs = False as you suggested, but the late runs still got executed. Any idea how to properly do this? Thanks you very much