Fina Silva-Santisteban

    Fina Silva-Santisteban

    10 months ago
    Hi everyone! Is there a way to get/download the timeline summary of a flow run? E.g. a table and/or bar chart with completion time for each task? That would help with benchmarking and identifying bottlenecks. If I take one of our examples, you can see that there’s a task that takes a long time to run in the middle, and some that take less. It gives me an idea of potential bottlenecks and how the completion times relate to each other, so that’s good, but I don’t know how long each task actually takes unless I hover over each value. The first impression of the task in the middle being a bottleneck is still valid, but since it only takes 1minute to run we might be ok with that bottleneck. A summary, at least in table form, would help out a lot!
    Anna Geller

    Anna Geller

    10 months ago
    @Fina Silva-Santisteban afaik, there is no such option atm. But you can use GraphQL queries to create that. It would involve some work, especially to calculate the durations. Maybe this can help you get started: • query showing fields that may be helpful to create this dashboard:
    query {
      flow_run(
        where: { state: { _eq: "Success" } }
        order_by: { state_timestamp: desc }
        limit: 100
      ) {
        name
        flow_id
        scheduled_start_time
        start_time
        end_time
        task_runs {
          name
          id
          end_time
          start_time
        }
        version
        agent {
          type
        }
      }
    }
    • using client to execute the query
    from prefect import Client
    
    client = Client()
    query = """
    paste the above
    """
    client.graphql(query)
    Fina Silva-Santisteban

    Fina Silva-Santisteban

    10 months ago
    @Anna Geller This looks interesting! But I’m afraid I can’t quite follow: how is sending a flow run request manually using graphql going to provide me with completion time stats? 🤔
    Anna Geller

    Anna Geller

    10 months ago
    that’s what I meant that you would have to do some work on your end in Python code to calculate the duration times. What the GraphQL query would give you is raw data incl. start and end times of both flow runs and task runs. You could then use this data in Python code to make calculations and visualize it e.g. with matplotlib. I would be super interested to hear if anyone from the community has perhaps done it in the past and can share their approach.
    Fina Silva-Santisteban

    Fina Silva-Santisteban

    10 months ago
    @Anna Geller I use the prefect api to trigger flow runs and don’t send graphql requests myself that’s why I don’t know what I get returned if I was to send a graphql query myself. Does it make sense now why I don’t understand your suggestions? 🙂 Can you pls post links to docs that show exactly what it is I get returned from graphql, or whether I need to ping graphql myself to get that info?
    Anna Geller

    Anna Geller

    10 months ago
    @Fina Silva-Santisteban sure, here is the documentation you may want to check:https://docs.prefect.io/orchestration/concepts/api.html#getting-started when it comes to what do you get as a result, it is a dictionary. You get the same response whether you run it from Python or from API playground in the UI:
    from prefect import Client
    
    client = Client()
    
    query = """
    query {
      flow_run(
        where: { state: { _eq: "Success" } }
        order_by: { state_timestamp: desc }
        limit: 100
      ) {
        name
        flow_id
    		flow {
          name
          project {name}
        }
        scheduled_start_time
        start_time
        end_time
        task_runs {
          name
          id
          end_time
          start_time
        }
        version
        agent {
          type
        }
      }
    }
    """
    response = client.graphql(query)
    print(response)
    Output:
    {'data': {'flow_run': [a long list of your flow runs here]}}
    Kevin Kho

    Kevin Kho

    10 months ago
    Hey @Fina Silva-Santisteban, I went over this and the query Anna gave is not creating a Flow run. It is for getting the flow run statistics after the Flow run already happened. You would need the GraphQL API to get the info and then manipulate it yourself. We don’t calculate duration either (the UI does that on the fly), so you need to manually do that using the start time and end time from the info you receive from the API.
    Fina Silva-Santisteban

    Fina Silva-Santisteban

    10 months ago
    Thank you both, that’s very helpful! 💪