https://prefect.io logo
g

Grant

07/06/2023, 3:52 PM
Hey all, I am running a prefect agent from Command Prompt on my company's on-prem server from a remote session. Is there a way to programmatically re-kick the agent from that on-prem remote machine? For example, I would like to create an automation that runs the command "prefect agent start" on the remote session when the work pool enters "unhealthy" status for more than 10 minutes. How would I go about doing that?
w

Will Raphaelson

07/06/2023, 3:54 PM
Hey Grant, thanks for that question - I can think of some tape and glue methods here, none of which feel great. If your agent needs to be restarted frequently, that sounds like an upstream problem we should actually look into. Under what circumstances do you find the agent process having died or in need of a restart?
g

Grant

07/06/2023, 4:01 PM
Yeah, a couple of times now when I see the work pool gets stuck in "unhealthy" status and I check the cmd in my remote session, the agent has stopped and the error message in cmd says "an exception has occurred". My logs don't capture agent issues so I don't have the full error message, but I notice that some errors occur in the output of the cmd while running flows leading up until the time of the exception and ending of the agent. I will follow up with the full error message the next time this happens.
This message contains interactive elements.
j

Jeff Hale

07/06/2023, 4:33 PM
You should be able to safely ignore that warning. Will is the person for automation questions, so you’re in good hands.
w

Will Raphaelson

07/06/2023, 4:52 PM
yes please do capture the error message if possible. Even better, check out using the process worker, the next generation of the agent, which which send worker logs to the server.
👍 1
🙌 1
j

Jeff Hale

07/06/2023, 4:55 PM
The process worker is coming up in about 10 minutes in PACC, @Grant. 🙂
🙏 1
😄 1
g

Grant

07/10/2023, 12:43 PM
@Will Raphaelson wanted to follow up with the error log from this weekend where my agent went down (see attached). I have tried to implement a process worker, but had issues getting my flow runs to run successfully using a worker rather than an agent. Do you have any insight? See that thread: https://prefect-community.slack.com/archives/CM28LL405/p1688752320734569
w

Will Raphaelson

07/10/2023, 5:21 PM
Thanks grant - is the bonus results work pools the agent type or process type? you’ll have to create a work pool of process type to start a worker that can pick up work from it. Not sure why you’re getting that 408 though, let me ask some folks internally.
g

Grant

07/10/2023, 5:29 PM
Thanks Will. Yep I created a Process type work pool as mentioned in that thread, and what was weird was that the flow kicked off successfully but kept getting hung up on the final task. Would it be a matter of resource allocation?
w

Will Raphaelson

07/10/2023, 5:32 PM
i dont think so but ive asked for some support from engineering on this. one question - are you on cloud or open source?
g

Grant

07/10/2023, 5:38 PM
I am on prefect cloud 2!
Really getting the feeling that this has something to do with resource allocation, tried running a flow that takes 10s usually using a worker and process and now the last 2 tasks are taking several minutes each, the last task still running... I can't query the table in SSMS that the flow is performing an update statement on.
w

Will Raphaelson

07/10/2023, 9:10 PM
trying to work through what could be going on here. do you mean resource of the execution environment? Can i ask where you are running these things? Thanks.
A set of minimum reproduction steps would be useful here, if you can write those up in an issue here I can get it looked at more deeply.
g

Grant

07/11/2023, 12:35 PM
@Will Raphaelson I am running the worker and agent on an on-prem server from command prompt - yes I was referring to the resources of the execution environment, sorry for the lack of clarity. I will submit an issue as soon as I have a chance, thank you!
w

Will Raphaelson

07/11/2023, 12:53 PM
Okay. And one way to test is this is a resource issue could be to either up the size of the box you’re running on, or profile the code.