Thread
#prefect-community
    Jeremy Phelps

    Jeremy Phelps

    1 year ago
    Hi all, Is there a way to completely nuke a Prefect account? Our account is in a bad state - Nothing will run. Previous suggestions have not worked. There are a bunch of tasks stuck in the
    Queued
    state and I can't change them at all (the
    set_task_run_states
    mutation claims
    "SUCCESS"
    for each of the task runs, but the same task run IDs are still returned when I look for
    Queued
    task runs). If I could just delete everything, my deployment scripts would be able to recreate the flows, projects, etc. and everything would finally work.
    I tried using
    setTaskRunState
    instead, on a single ID:
    mutation {
      setTaskRunState(input: { taskRunId: "d615f3e8-701e-4e9e-afc7-aba1ae4ace15",
      	                        version: 10,
      	                        state: "{\"type\":\"Failed\",\"message\":\"Manually cancelled.\"}"} )
      {
        id
        task_run {
          state
          id
          task_id
          
        }
      }
    }
    The return value shows that it failed (the task run is still
    Queued
    ), without showing any reason:
    {
      "data": {
        "setTaskRunState": {
          "id": "d615f3e8-701e-4e9e-afc7-aba1ae4ace15",
          "task_run": {
            "state": "Queued",
            "id": "d615f3e8-701e-4e9e-afc7-aba1ae4ace15",
            "task_id": "8dccd04d-0df3-4ac9-9ef9-8a73581d1a0d"
          }
        }
      }
    }
    nicholas

    nicholas

    1 year ago
    Hi @Jeremy Phelps - can you try that mutation again with
    set_task_run_states
    instead and the mutation example I sent you the other day?
    Jeremy Phelps

    Jeremy Phelps

    1 year ago
    I started with
    set_task_run_states
    . It reports
    "SUCCESS"
    for every task_run_id I pass to it, but in reality nothing happens.
    nicholas

    nicholas

    1 year ago
    How are you querying for those task runs?
    Jeremy Phelps

    Jeremy Phelps

    1 year ago
    Two ways:1. With the Prefect UI 2. With the following GraphQL query:
    query{
          task  {
            flow {
                id
                name
              }
            id
            name
            tags
            task_runs (where: {state: {_eq:"Queued"}}) {
              id
              name
              state
            }
            }
          }
    The query returns the same task_run_ids that were "SUCCESS"fully set to
    Failed
    with
    set_task_run_states
    .
    nicholas

    nicholas

    1 year ago
    Are the flow runs that contain those tasks still running by any chance?
    Jeremy Phelps

    Jeremy Phelps

    1 year ago
    They're marked as
    Running
    . But I'd expect Prefect to switch them to
    failed
    because task failure should trigger flow run failure.
    I'm trying to get the account to be totally inactive so that new flow runs can be started.
    nicholas

    nicholas

    1 year ago
    Can you try using the global cancel all from the dashboard? It's on the
    Runs in Progress
    tile here:
    Jeremy Phelps

    Jeremy Phelps

    1 year ago
    Some flow runs reached the
    Cancelled
    state, while others are stuck in
    Cancelling
    .
    nicholas

    nicholas

    1 year ago
    Those might take some time to cancel if they're running in k8s or similar infra; you can always manually remove those jobs as well as needed
    Jeremy Phelps

    Jeremy Phelps

    1 year ago
    I've been trying to manually remove the jobs all along. That doesn't seem to work.
    I took down the k8s pods that were running those tasks.
    (although those pods were really only running the agents that were overseeing the tasks)
    The flow runs that were "Cancelling" are now "Failed".
    nicholas

    nicholas

    1 year ago
    That sounds correct, if the infra was torn down Prefect doesn't have a way of knowing that things were cancelled correctly, so they were likely failed due to unresponsiveness
    But if those have all been failed/cancelled, it sounds like you should be able to move work again
    Jeremy Phelps

    Jeremy Phelps

    1 year ago
    Strange that they have to be cancelled before they'll fail for that reason. The infrastructure was torn down days ago.
    nicholas

    nicholas

    1 year ago
    Do you have heartbeats turned off for those flows?
    (and to be fair, heartbeats can be wonky)
    Jeremy Phelps

    Jeremy Phelps

    1 year ago
    I have heartbeats set to whatever the default is. (I presume that's "on", because the failure message mentioned a lack of heartbeat.)
    nicholas

    nicholas

    1 year ago
    In that case something was still sending a heart beat; if you only killed the pods running the agent(s) and not the jobs running the flows that could be it
    Are you unblocked though?
    Jeremy Phelps

    Jeremy Phelps

    1 year ago
    Yes, I'm unblocked, until this problem crops up again.