Hey Folks :wave: Lately, some of my flow runs hav...
# ask-community
j
Hey Folks 👋 Lately, some of my flow runs have a "Run Count" of 2 (instead of 1). When looking at the "Results" tab I see that both runs were started pretty much at the same time. We're using k8s infra. What could be the conditions of a single flow run being started more than once? Having a flow launched twice can be quite problematic, especially when they have the same id.
z
Hey! Are you using Prefect Cloud or OSS?
j
Cloud Enterprise
z
Can you share the state transitions the run went through?
Copy code
# usage: python <file>.py <FLOW_RUN_ID>
from prefect import get_client

async def main(flow_run_id):
    async with get_client() as client:
        states = await client.read_flow_run_states(flow_run_id)
        for state in states:
            print(state.timestamp, state.type.name, state.name)


import asyncio
import sys

asyncio.run(main(sys.argv[1]))
Additionally • What version are you using? • Are retries configured for the run? • Are any automations configured for the run? • Is the run managed by a worker or an agent? • Is it a subflow run? • Was the run trigged by
run_deployment
or a schedule?
j
@Ton Steijvers
t
• Prefect version 2.10.6. • No retries configured for the flow. • Automations: yes, if flow run state enters crashed or timedout then send Slack notification. • Run is managed by Prefect agent, starting a k8s job • Not a subflow. • Flow is scheduled to run every 5 minutes. We have seen a few cases in the last 2 weeks (beginning May 3rd) where the agent kicks off 2 k8s jobs simultaneously (within 10 seconds) with the same flow run id which results in the flow run having a run count of 2. This should never happen as the flow run performs database operations that cannot run in parallel.
Copy code
@flow(timeout_seconds=timedelta(minutes=30).seconds)
def my_flow():
   ...

if __name__ == "__main__":
    my_flow()
z
The state transitions are a very important part can you get those please
t
Copy code
2023-05-08T16:59:24.503703+00:00 SCHEDULED Scheduled
2023-05-08T17:20:11.137558+00:00 PENDING Pending
2023-05-08T17:20:11.262198+00:00 SCHEDULED Late
2023-05-08T17:20:23.517246+00:00 PENDING Pending
2023-05-08T17:21:17.567051+00:00 RUNNING Running
2023-05-08T17:21:18.140062+00:00 RUNNING Running
2023-05-08T17:21:42.109878+00:00 FAILED Failed
2023-05-08T17:21:47.848781+00:00 FAILED Failed
here's another one, on the same day but in a different workspace:
Copy code
2023-05-08T14:40:09.160387+00:00 SCHEDULED Scheduled
2023-05-08T15:20:07.033267+00:00 PENDING Pending
2023-05-08T15:20:10.378419+00:00 SCHEDULED Late
2023-05-08T15:20:16.716348+00:00 PENDING Pending
2023-05-08T15:20:48.902959+00:00 RUNNING Running
2023-05-08T15:20:58.857477+00:00 RUNNING Running
2023-05-08T15:21:33.556404+00:00 FAILED Failed
2023-05-08T15:21:53.112003+00:00 FAILED Failed
z
This is from a bug where runs could be marked as LATE after they had started due to delays in our late runs service. We’ve resolved this issue in Cloud and you should not see that happen again.
👍 2
173 Views