https://prefect.io logo
v

Verun Rahimtoola

02/02/2021, 5:32 PM
hi, is there a quick way to check the health and status of the lazarus process/service? perhaps a REST endpoint we could query or a command line utility we can invoke?
j

Jim Crist-Harif

02/02/2021, 5:34 PM
I assume you're asking about a health check for that service when running your own version of Prefect Server (not one for pinging cloud)?
v

Verun Rahimtoola

02/02/2021, 5:34 PM
yes @Jim Crist-Harif that is correct!
j

Jim Crist-Harif

02/02/2021, 5:35 PM
Sure, I'll open an issue
v

Verun Rahimtoola

02/02/2021, 5:36 PM
excellent thanks 👍
j

Jim Crist-Harif

02/02/2021, 5:36 PM
@Marvin open "Ensure health check available for all services (where it makes sense)" in server
v

Verun Rahimtoola

02/02/2021, 5:36 PM
@Jim Crist-Harif in the meantime is there anything we can do to sort of "manually" figure out if everything is running fine?
as of right now some of our flows have having tasks stuck in the pending state forever
j

Jim Crist-Harif

02/02/2021, 5:38 PM
I don't believe any check is currently done for pending tasks (pending only means the task id is created, not that it ever started), but zombie running tasks should be reaped periodically in a healthy system.
v

Verun Rahimtoola

02/02/2021, 5:38 PM
although the flow still shows as running on the perfect server UI, and the agent at least is able to talk to graphql fine. i'm trying to work with our ops team to figure out if anything is off with our lazarus process so it would be helpful to know if there's any specific steps for them to take to figure this out. thank you!
j

Jim Crist-Harif

02/02/2021, 5:39 PM
cc @Zanie, our resident server expert ^^
v

Verun Rahimtoola

02/02/2021, 5:40 PM
i see. i thought lazarus would check that "hey this flow shows as running but its tasks have been stuck as pending for a while, let me do something"... but i guess that's not what it does
j

Jim Crist-Harif

02/02/2021, 5:40 PM
No, as long as the main flow run process is still healthy the flow is marked as active. What executor are you using?
We're working on some features that would let you detect the above though, not released yet.
z

Zanie

02/02/2021, 5:42 PM
(I do not believe those features will be available in Server though)
👍 1
You could probably write a loop service that checks for the case you’re interested in if this is a significant issue for you.
v

Verun Rahimtoola

02/02/2021, 5:55 PM
@Jim Crist-Harif we actually have our own executor we've implemented to run tasks on our HPC
@Zanie i think this might be an issue for us, we could write the GQL query to detect the issue but how would we get prefect to handle it?
z

Zanie

02/02/2021, 5:57 PM
You’d have to write and run a service like Lazurus
although it seems likely this is a bug in your executor?
v

Verun Rahimtoola

02/02/2021, 5:58 PM
it could be a bug, i just don't know where to look though. because when we directly run on our HPC using
flow.run()
everything runs fine
as i mentioned in a thread yesterday, in this particular case, what's happening is that: 1. the flow is registered to the server 2. an agent is brought up on our HPC 3. the flow is triggered to run from the prefect server UI 4. the "top level" tasks (parameters in the case of this flow) execute and return quickly, but then 5. the "next" level of tasks in the dag remain stuck in the pending state the difference between
flow.run
vs the agent picking up a flow run is that
flow.run
runs a loop over the tasks of the flow, until
flow.is_finished
whereas when the agent deploys a flow, it uses the flow runner's
run()
which only submits the flow's tasks a single time
so something somewhere i think is off, i could also be totally misunderstanding 😛
j

Jim Crist-Harif

02/02/2021, 6:11 PM
If the flow is still reporting as active, this sounds like an issue with your custom executor (which isn't a supported feature of prefect). If you can reproduce the issue with a supported builtin executor we could investigate further.
v

Verun Rahimtoola

02/02/2021, 6:14 PM
i see, thanks. any pointers about where we should look in our custom exec implementation?
j

Jim Crist-Harif

02/02/2021, 6:15 PM
Nope. I'd check if you have any tasks pending or locking internally going on. Figuring out what the flow runner is waiting on would be useful here.
v

Verun Rahimtoola

02/02/2021, 6:25 PM
so, curiously the flow runner does run through all the tasks of the flow, but it's only doing this once as far as i can tell...