Hi guys I m trying to set up prometheus on prefect inside th Prefect Community #prefect-server

Hi guys! I’m trying to set up prometheus on prefec...

Bruno Kuasney

12/01/2021, 8:59 AM

Hi guys! I’m trying to set up prometheus on prefect (inside the flow, so each time the flow runs I send something to prometheus endpoint), I tried with the example you provide on pushgateway.py (at the dockstring of the run function). But, ok, the app is not breaking (meaning that probably is working), but also, apparently is not doing anything (I cannot hit the endpoint provided on

pushgateway_url

parameter). how is suppose to work that example since I’m not able to hit the endpoint? I also kinda forced the endpoint to be created by using

start_http_server(8000)

from

prometheus_client

. Like I said, is not breaking the code, so maybe is working, I’m just not able to see it somehow.

Anna Geller

12/01/2021, 10:08 AM

@Bruno Kuasney what you end goal here - to send flow run logs to Prometheus? What agent type do you use?

Anna Geller

12/01/2021, 10:11 AM

@Bruno Kuasney we do have two Prometheus tasks in the task library: https://docs.prefect.io/api/latest/tasks/prometheus.html#pushgaugetogateway

Bruno Kuasney

12/01/2021, 10:12 AM

my goal is to just have the count of runs inside the endpoint, like the image below. (for monitoring purposes and graphana dashboard). My goal is: everytime my flow runs, it increments the count for example. I’m running locally atm with

my_project.run()

reference image:

Anna Geller

12/01/2021, 10:14 AM

The count of task runs of a specific task? I wonder whether you could get this information directly from a GraphQL query from Prefect backend. It would be great if you could describe the problem a bit more, perhaps we don’t need Prometheus to solve it

Bruno Kuasney

12/01/2021, 10:22 AM

I tried to use that guy you sent, but i cannot see like the image above I sent. --- My “workaround” atm is by using state_handlers. So, on my flow i have

state_handler=[send_notification, prometheus_test]

So, basically, every time my task runs I increment my prometheus gauge and I can see this working on my endpoint 8000 (image attached). --- In short, what I’m doing is like this:

Copy code

with Flow(f"{project_name}", schedule=schedule, state_handlers=[send_notification, prometheus_test]) as protect:
...
...
...

where my

prometheus_test

is sth like this: (is not finished ofc, is just kinda a POC)

Copy code

# declared at the beginning (outside the function)
g = Gauge('my_inprogress_requests_protect', 'Count requests to process Protect')

# function to increment every time a tasks run
def prometheus_test(obj, old_state, new_state):
    g.inc()

so when I hit localhost:8000 I can see how many time this task ran:

Bruno Kuasney

12/01/2021, 10:22 AM

I’;m having success with this workaround atm, but honestly sounds like a “hack”

Anna Geller

12/01/2021, 10:26 AM

If that works for you, then great. But if your problem is to get the count of task runs for a specific task, you could get it with GraphQL without any extra tools like Prometheus,

Bruno Kuasney

12/01/2021, 10:29 AM

hmmm, gonna test both methods, thx 🙂

🙌 1

davzucky

12/01/2021, 12:30 PM

Hi @Bruno Kuasney I added this task. We are using it to send our metrics of flow to the push gateway. However this is not something that is exposing metrics. looking to Orion to add that. Hope that makes sense. Happy to talk more about it

👍 2

Bruno Kuasney

12/01/2021, 2:21 PM

cheers! thx guys

Bruno Kuasney

12/01/2021, 2:25 PM

I don’t want to be that kind of guy, but, do you have an example on how to get the count of runs on graphql you guys mentioned on this thread? 😐 this would work for me atm. @Anna Geller @davzucky

Anna Geller

12/01/2021, 2:31 PM

@Bruno Kuasney you could use this query as a template: you would likely need to pass the task name into where: https://github.com/PrefectHQ/ui/blob/master/src/graphql/TaskRun/task-run-states-count.gql

Bruno Kuasney

12/01/2021, 2:33 PM

hmm that’s cool, nice! thx again guys

Anna Geller

12/01/2021, 2:41 PM

this should work:

Copy code

{
  task_run_aggregate(
    where: {task_id: {_eq: "5f3c4446-f7f3-423c-b8ad-59005b497a2b"}}
  ) {
    nodes {
      task {
        name
      }
      end_time
    }
    aggregate {
      count
    }
  }
}

👌 1

davzucky

12/01/2021, 4:08 PM

We had a thread before about writing a prometheus exporter for perfect. I had a poc working however we orion was announced I didn't push forward and will look to integrate with Orion only which should be easier. Did it make sense?

davzucky

12/01/2021, 4:09 PM

The poc was querying the graplql on a 10s time frame and expose to prometheus like what you want to do.

Bruno Kuasney

12/02/2021, 8:29 AM

regarding Orion, is it still not recommended for production environment right? I think we gonna keep core until the red light from prefect side. This solution you mentioned is exactly what I’m trying to replicate hahaha so would be perfect. --- btw, just for a future search (if someone start to have the same issue) the workaround I mentioned, by “forcing an exposure (8000)” and by using state_handlers on the flow did not work when deployed. Did work locally, but did not work when working with prefect-server

Tyler Wanner

12/21/2021, 5:25 PM

fwiw I really like what you are doing with pushing telemetry from your flow rather than querying the graphql API. Both of these solutions would be awesome to see in practice though! I believe both of these are worth exploring without waiting on production Orion

11 Views

Open in Slack

Previous Next