Thread
#prefect-community
    Mark McDonald

    Mark McDonald

    1 year ago
    Hi - we have a flow that occasionally gets stuck in a running state until we manually cancel it. We tried using the automations feature, but we encountered this situation last night and it didn't work
    the automation should cancel the flow if does not finish after 65 minutes
    as you can see, it ran past 65 minutes, so we manually cancelled it - the automation didnt seem to do anything
    Kevin Kho

    Kevin Kho

    1 year ago
    Hey @Mark McDonald, is the flow expected to get stuck in a running state sometimes?
    About the automation, I’ll bring that up to the team
    Mark McDonald

    Mark McDonald

    1 year ago
    thanks @Kevin Kho - it's not expected to get caught in a running state. I wish I could explain why it happens. I run this flow every hour, 24 hours a day, 7 days a week. About once or twice a week, a flow run just seems to get stuck in the running state, unexplainably.
    Kevin Kho

    Kevin Kho

    1 year ago
    I see this happen when Dask itself freezes to being resource constrained. Any signs of that going on?
    Mark McDonald

    Mark McDonald

    1 year ago
    interesting - from our container insights, it looks like there is plenty of available cpu and memory. I will try to dig into this deeper today
    Michael Adkins

    Michael Adkins

    1 year ago
    Hey @Mark McDonald -- Could you do me a favor and give me the flow run id that was not cancelled and in the interactive API query for the hook and share the one for that automation? e.g.
    query {
      hook {
        action_id
        id
        event_tags
        event_type
      }
    }
    Mark McDonald

    Mark McDonald

    1 year ago
    sure - @Michael Adkins this is the flow run that was not cancelled by the automation 7f066f3b-7e0e-453a-a925-6a5f5d1ee485
    this is the response for the automation
    {
      "data": {
        "hook": [
          {
            "action_id": "e9e1ef7a-3b1a-4526-88de-c4318021194a",
            "id": "066c67bb-f3cf-465a-93e0-2fa39e37a2e7",
            "event_tags": {
              "flow_sla_config_id": [
                "24426ad8-c36a-41f0-a97f-3e0f1ea25efe"
              ]
            },
            "event_type": "FlowSLAFailedEvent"
          }
        ]
      }
    }
    Michael Adkins

    Michael Adkins

    1 year ago
    Great thanks, I'll look into some logs and get back to you
    I'm continuing to investigate this, just fyi
    Hey @Mark McDonald -- just to confirm, this run started after you created the automation right?
    Could you also show me:
    query {
      flow_sla_config {
        id
        kind
        flow_groups {
          id
        }
        duration_seconds
      }
    }
    Mark McDonald

    Mark McDonald

    1 year ago
    @Michael Adkins confirmed
    {
      "data": {
        "flow_sla_config": [
          {
            "id": "4766b0e8-f8bb-46b3-9bbe-2aac4ec14082",
            "kind": "STARTED_NOT_FINISHED",
            "flow_groups": [
              {
                "id": "0fb8a078-86f5-4812-b7df-f86d462feb9d"
              }
            ],
            "duration_seconds": 3600
          },
          {
            "id": "24426ad8-c36a-41f0-a97f-3e0f1ea25efe",
            "kind": "STARTED_NOT_FINISHED",
            "flow_groups": [
              {
                "id": "49876534-8f63-45e6-96cd-b09ba1344fc8"
              }
            ],
            "duration_seconds": 3900
          }
        ]
      }
    }
    I think the one with duration is 3600 is what I initially created, and then I updated it to a duration of 3900
    again, this was all done before the flow that should have been cancelled ran
    Michael Adkins

    Michael Adkins

    1 year ago
    Hey @Mark McDonald -- so it looks like the flow group
    49876534-8f63-45e6-96cd-b09ba1344fc8
    does not exist which would be why your SLA was not enforced. Your flow run belongs to the flow group
    22f322dd-0201-4769-9246-2a1b6551527c
    -- did you delete the flow group after creating the automation and register a new one?
    Mark McDonald

    Mark McDonald

    1 year ago
    so - that's an interesting point, we have multiple projects for our different environments. We have a uat and a prod project that has separate flows by the same name
    so, because the UI doesn't allow you to select the project, I assumed you were querying based on flow name
    afaik, we never deleted a flow group for this particular flow. We redeploy the flow approximately 1x per week, but the flow group never changes
    ahh - ok I think I had the wrong project @Michael Adkins - sorry about that I didn't see the "project name" in the top right corner when I set this up
    consider this issue closed - I will let you know if the automation is successful the next time this flow runs past the sla
    thank you for the help, and apologies for the false alarm
    Michael Adkins

    Michael Adkins

    1 year ago
    No problem! Glad we got it sorted out.