Hi guys! I’m trying to set up prometheus on prefec...
# prefect-server
b
Hi guys! I’m trying to set up prometheus on prefect (inside the flow, so each time the flow runs I send something to prometheus endpoint), I tried with the example you provide on pushgateway.py (at the dockstring of the run function). But, ok, the app is not breaking (meaning that probably is working), but also, apparently is not doing anything (I cannot hit the endpoint provided on 
pushgateway_url
 parameter). how is suppose to work that example since I’m not able to hit the endpoint? I also kinda forced the endpoint to be created by using
start_http_server(8000)
 from 
prometheus_client
 . Like I said, is not breaking the code, so maybe is working, I’m just not able to see it somehow.
a
@Bruno Kuasney what you end goal here - to send flow run logs to Prometheus? What agent type do you use?
@Bruno Kuasney we do have two Prometheus tasks in the task library: https://docs.prefect.io/api/latest/tasks/prometheus.html#pushgaugetogateway
b
my goal is to just have the count of runs inside the endpoint, like the image below. (for monitoring purposes and graphana dashboard). My goal is: everytime my flow runs, it increments the count for example. I’m running locally atm with
my_project.run()
reference image:
a
The count of task runs of a specific task? I wonder whether you could get this information directly from a GraphQL query from Prefect backend. It would be great if you could describe the problem a bit more, perhaps we don’t need Prometheus to solve it
b
I tried to use that guy you sent, but i cannot see like the image above I sent. --- My “workaround” atm is by using state_handlers. So, on my flow i have
state_handler=[send_notification, prometheus_test]
So, basically, every time my task runs I increment my prometheus gauge and I can see this working on my endpoint 8000 (image attached). --- In short, what I’m doing is like this:
Copy code
with Flow(f"{project_name}", schedule=schedule, state_handlers=[send_notification, prometheus_test]) as protect:
...
...
...
where my
prometheus_test
is sth like this: (is not finished ofc, is just kinda a POC)
Copy code
# declared at the beginning (outside the function)
g = Gauge('my_inprogress_requests_protect', 'Count requests to process Protect')

# function to increment every time a tasks run
def prometheus_test(obj, old_state, new_state):
    g.inc()
so when I hit localhost:8000 I can see how many time this task ran:
I’;m having success with this workaround atm, but honestly sounds like a “hack”
a
If that works for you, then great. But if your problem is to get the count of task runs for a specific task, you could get it with GraphQL without any extra tools like Prometheus,
b
hmmm, gonna test both methods, thx 🙂
🙌 1
d
Hi @Bruno Kuasney I added this task. We are using it to send our metrics of flow to the push gateway. However this is not something that is exposing metrics. looking to Orion to add that. Hope that makes sense. Happy to talk more about it
👍 2
b
cheers! thx guys
I don’t want to be that kind of guy, but, do you have an example on how to get the count of runs on graphql you guys mentioned on this thread? 😐 this would work for me atm. @Anna Geller @davzucky
a
@Bruno Kuasney you could use this query as a template: you would likely need to pass the task name into where: https://github.com/PrefectHQ/ui/blob/master/src/graphql/TaskRun/task-run-states-count.gql
b
hmm that’s cool, nice! thx again guys
a
this should work:
Copy code
{
  task_run_aggregate(
    where: {task_id: {_eq: "5f3c4446-f7f3-423c-b8ad-59005b497a2b"}}
  ) {
    nodes {
      task {
        name
      }
      end_time
    }
    aggregate {
      count
    }
  }
}
👌 1
d
We had a thread before about writing a prometheus exporter for perfect. I had a poc working however we orion was announced I didn't push forward and will look to integrate with Orion only which should be easier. Did it make sense?
The poc was querying the graplql on a 10s time frame and expose to prometheus like what you want to do.
b
regarding Orion, is it still not recommended for production environment right? I think we gonna keep core until the red light from prefect side. This solution you mentioned is exactly what I’m trying to replicate hahaha so would be perfect. --- btw, just for a future search (if someone start to have the same issue) the workaround I mentioned, by “forcing an exposure (8000)” and by using state_handlers on the flow did not work when deployed. Did work locally, but did not work when working with prefect-server
t
fwiw I really like what you are doing with pushing telemetry from your flow rather than querying the graphql API. Both of these solutions would be awesome to see in practice though! I believe both of these are worth exploring without waiting on production Orion