David Elliott
02/22/2022, 4:53 PMDeliveroo
prefect tenant, happy to send over some specifics / URLs if helpfulKevin Kho
David Elliott
02/22/2022, 5:19 PMKevin Kho
set_active_schedule
flag with flow.register()
?David Elliott
02/22/2022, 5:22 PMDavid Elliott
02/22/2022, 5:23 PMDavid Elliott
02/22/2022, 5:30 PMDavid Elliott
02/22/2022, 5:31 PMprefect[aws,kubernetes]==0.15.5
David Elliott
02/22/2022, 5:51 PM{
flow_run(
where: {
auto_scheduled: {_eq: true},
state: {_in: ["Scheduled"]},
flow: {name: {_eq: "bi_pipeline_v2"},
project: {name: {_eq: "bi-pipeline-v2-staging"}}
}
}
order_by: {scheduled_start_time: asc_nulls_last}
) {
id
start_time
flow {
name,
version,
archived,
project {
name
}
}
version
end_time
updated
state
scheduled_start_time
}
}
Kevin Kho
Anna Geller
David Elliott
02/22/2022, 6:49 PMAnna Geller
Kevin Kho
Anna Geller
David Elliott
02/22/2022, 7:37 PMa2b2af53-aa99-40db-a05e-cd1ad54b9ce6
) was version 1066 and got to Running
today prior to us cancelling it. The current version was 1071 which ran first (7295ddff-69b4-4ef9-b638-14fa94e1ca7a
) and then as soon as that was done the archived one above started running as they’d both been scheduled for 4pm (but the flow concurrency = 1 stopped them from both running simultaneously)Kevin Kho
David Elliott
02/22/2022, 7:40 PMDavid Elliott
02/22/2022, 7:47 PMRun `successful-sloth` (`3c295d6f-16bf-4cf8-964c-4437c8891bfc`) of flow `bi_pipeline_v2` failed `SCHEDULED_NOT_STARTED` SLA (`4cb3b9a7-93e8-4353-8759-48601a357106`) after 300 seconds. See [the UI](<https://cloud.prefect.io/deliveroo/flow-run/3c295d6f-16bf-4cf8-964c-4437c8891bfc>) for more details.
What’s weird is that that flow run linked doesn’t exist. But it must have done at some point in order to trigger the Automation..? When you click on the URL to the flow_run it provides, you get a blank page, and GraphQL returns nothing for that flow_run.
ie we have 2 slightly different but related issues:
• on staging we have multiple flow runs scheduled (you can see the archived scheduled flow_runs in the UI), some of which are archived flows, and they actually start running when given a chance to
• on production we can only see 1 flow run per day scheduled (which is correct) but when it comes to schedule start time, we’ve got some phantom archived flow runs that are trying to start and triggering the pagerduty automation. But, the flow_runs don’t seem to existAnna Geller
Anna Geller
David Elliott
02/22/2022, 9:01 PMstaging
yeah we’re on 1071 - that’s because our CICD builds + registers the flow each time we merge to staging, which is multiple times per day.
This pipeline is the entirety of our company’s SQL-based ETL, the flow is 1500 tasks, and we have tonnes of people working on the SQL logic in this flow. As such, we merge legitimate changes to staging
multiple times per day, and then at 4pm whatever’s in staging
at that time runs per this cron, and if it’s successful, we then merge staging
to master
, which happens once per day.
i.e that’s why we have so any versions of this flow, but equally we’re seeing similar issues on production with the above Automation as described, and that has only ~270 versions?
I feel like regardless of how many flow versions there are though, flow.register ought to be able to handle this..? If it is a scale issue with number of flow versions, people will start running into that over time anyway? We’ve maybe just hit it early due to the amount of development that happens on this flow?David Elliott
02/22/2022, 9:05 PMAnna Geller
David Elliott
02/22/2022, 11:00 PMAnna Geller