< Marvin> my flow run looks like this It has run count=2 but Prefect Community #ask-marvin

<@ULVA73B9P> my flow run looks like this: It has r...

tony

10/10/2025, 7:44 PM

@Marvin my flow run looks like this: It has run_count=2 but retries=0. how can I figure out the exact reason why run_count is two?

Copy code

id=UUID('068e92c0-b816-796f-8000-58ea45be7220') name='friendly-sheep' flow_id=UUID('301be4f1-ec40-4972-84a5-106b4fe39153') state_id=UUID('068e9512-b1cf-7af9-8000-26ed9f411fed') deployment_id=UUID('12326fbc-3954-4a6b-9e75-df2bca814b69') deployment_version='31ff4223' work_queue_name='default' flow_version='8be7733e18f3912d423d585b42d81d02' parameters={'part': 2} idempotency_key='scheduled 12326fbc-3954-4a6b-9e75-df2bca814b69 b6cda2b0-0363-4b4a-addf-2fbb9075d234 2025-10-10 12:00:00-04:00' context={} empirical_policy=FlowRunPolicy(max_retries=0, retry_delay_seconds=0.0, retries=0, retry_delay=0, pause_keys=set(), resuming=False, retry_type=None) tags=['auto-scheduled'] labels={'prefect.flow.id': '301be4f1-ec40-4972-84a5-106b4fe39153', 'prefect.deployment.id': '12326fbc-3954-4a6b-9e75-df2bca814b69'} parent_task_run_id=None run_count=2 expected_start_time=DateTime(2025, 10, 10, 16, 0, 0, tzinfo=Timezone('UTC')) next_scheduled_start_time=None start_time=DateTime(2025, 10, 10, 16, 2, 54, 491333, tzinfo=Timezone('UTC')) end_time=None total_run_time=datetime.timedelta(seconds=8956, microseconds=621941) estimated_run_time=datetime.timedelta(seconds=13238, microseconds=229302) estimated_start_time_delta=datetime.timedelta(seconds=174, microseconds=491333) auto_scheduled=True infrastructure_document_id=None infrastructure_pid='038775ed-3011-4148-829d-3220b345264b:prefect:friendly-sheep-zzqhs' created_by=CreatedBy(id=UUID('b6cda2b0-0363-4b4a-addf-2fbb9075d234'), type='SCHEDULE', display_value='CronSchedule') work_queue_id=UUID('e955b6e6-1380-46b8-b503-2d966130b63b') work_pool_id=UUID('517425c7-20f9-459f-be62-75b36b6bf590') work_pool_name='analysis-pool' state=Running(message='', type=RUNNING, result=None) job_variables={} state_type=StateType.RUNNING state_name='Running'

Marvin

10/10/2025, 7:44 PM

✅ thought for 318.9 seconds

Marvin

10/10/2025, 7:50 PM

Great question. In Prefect 3.x,

run_count

increments each time the same flow run is (re)started after leaving a Running state — this can happen for reasons other than “retries,” like crash recovery, a manual resume/restart, infrastructure restarts, or a worker re-submission after a missed heartbeat. To find the exact reason for your run’s

run_count=2

, check the run’s state history, logs, and worker/infrastructure events around when it restarted. Here’s a fast way to investigate: 1) Inspect the flow run’s state history - This will show every state transition and the message that explains it (e.g., “crash detected,” “resumed by <user>,” etc.) - CLI:

Copy code

prefect flow-run inspect 068e92c0-b816-796f-8000-58ea45be7220

- In the UI: open the flow run page → State history / Timeline Docs: - Flow run states API 2) Review the flow run logs - Look for a second “Starting flow run...” line or messages like “Resubmitting after crash,” “Lost heartbeat,” “Infrastructure exited,” or “Resumed by …” - CLI:

Copy code

prefect flow-run logs 068e92c0-b816-796f-8000-58ea45be7220 --reverse

3) Check worker logs (most telling for crash recovery/resubmission) - Identify the worker handling your pool/queue (

work_pool_name='analysis-pool'

work_queue_name='default'

) - On that worker’s host/cluster logs, grep the run id:

Copy code

# Example on the worker host
grep 068e92c0-b816-796f-8000-58ea45be7220 /path/to/worker/logs/*

- Look for messages indicating re-attachment/resubmission, infrastructure restarts, or missed heartbeats. 4) Check events/timeline - In the UI timeline, look for events like “Worker claimed,” “Infrastructure started,” “Run crashed,” “Resubmitted,” “Resumed by <user>” - CLI stream (you can filter with grep):

Copy code

prefect events stream --format text | grep 068e92c0-b816-796f-8000-58ea45be7220

5) Optional: Query via Python to dump states and logs with messages

Copy code

from prefect.client.orchestration import get_client
from prefect.client.schemas.filters import LogFilter, LogFilterFlowRunId
from prefect.client.schemas.sorting import LogSort

flow_run_id = "068e92c0-b816-796f-8000-58ea45be7220"

async def debug():
    async with get_client() as client:
        fr = await client.read_flow_run(flow_run_id)
        print("run_count:", fr.run_count, "infrastructure_pid:", fr.infrastructure_pid)

        states = await client.read_flow_run_states(flow_run_id)
        for s in states:
            print(s.timestamp, s.name, s.type, "|", s.message or "")

        logs = await client.read_logs(
            LogFilter(flow_run_id=LogFilterFlowRunId(any_=[flow_run_id])),
            limit=2000, sort=LogSort.TIMESTAMP_ASC
        )
        for log in logs:
            print(log.timestamp, log.level, log.message)

How to interpret what you find - Running → Crashed → Scheduled/AwaitingRun → Running - Infrastructure or process crash/restart; worker likely resubmitted. Check worker logs and your infra (e.g., container/pod restarts). - Running → Paused → Running - Manual resume or automation resume; the message usually says who/what resumed it. - Running → Cancelling/Cancelled → Running - A manual restart (UI/CLI) of the same run. - Running → AwaitingRetry → Running - Would indicate an internal retry; not your case since

retries=0

. - Running → NotReady/MissingConcurrency → Running - Resumed after a concurrency gate; depending on conditions, this can increment

run_count

. The state message will indicate waiting for a slot. A few notes on your snippet - The

idempotency_key

you see ties the run to its schedule slot; it doesn’t cause new runs, and it won’t explain

run_count

. -

infrastructure_pid

changes can indicate a restart; you can often infer a restart if the pid in earlier logs differs from the current one. If you want, paste the output of: -

prefect flow-run inspect 068e92c0-b816-796f-8000-58ea45be7220

Marvin

10/10/2025, 7:50 PM

prefect flow-run logs 068e92c0-b816-796f-8000-58ea45be7220 --tail

and I can help pinpoint the exact transition that bumped

run_count

to 2.

Open in Slack

Previous Next