https://prefect.io logo
Title
r

Richard Alexander

03/13/2023, 12:48 PM
Quick question about
agents
,
work_pools
and
storage_blocks
. I have several agents (each polling a different work pool) on the same server trying to poll jobs from the same
s3
storage block, but only one of them is working. Can multiple agents/work pools be connected to the same storage block?
r

redsquare

03/13/2023, 12:55 PM
How do you mean poll jobs from s3, as in prefect jobs or your internal jobs thus your using s3 like a queue?
r

Richard Alexander

03/13/2023, 1:02 PM
I created an
s3 storage block
. Can multiple agents / work pools use the same storage block? Or, do you have ideas on how to troubleshoot why a Work Pool agent isn't working? The two that aren't working say that they are polling for work, but their jobs are sitting in a
late
status, untouched by the agents that should be picking them up.
r

redsquare

03/13/2023, 1:08 PM
so if I read this right your using s3 as storage for deployment, that should not impact flows being started, once started the flow is downloaded from s3, any logs from the agent?
r

Richard Alexander

03/13/2023, 1:11 PM
No logs from the agent. I have restarted each, and the log stays static on the first entry "Polling for work from work pool 'worker'".
And yes... I'm using S3 as storage for the deployment. Sorry for my inaccurate wording šŸ˜‰
😁 1
r

redsquare

03/13/2023, 1:12 PM
what happens if you invoke a 'run' from the UI
r

Richard Alexander

03/13/2023, 1:12 PM
I'll try...
r

redsquare

03/13/2023, 1:14 PM
I would double check the agent+pool defined on the deployment too
r

Richard Alexander

03/13/2023, 1:14 PM
OK, just started it via "Quick Run" in the UI. I expect that it will sit here in a late status as well, but I'll let you know in a few minutes.
Right, I'll do that again right now as well
r

redsquare

03/13/2023, 1:15 PM
image.png
r

Richard Alexander

03/13/2023, 1:21 PM
It's still sitting in a late status. Ah... the work pool is correct, but it's pointing to an old storage that no longer exists. I tried re-deploying, but it still didn't pick up the new storage. I'll try deleting the deployment all together and see if I can get it to start....
šŸ‘ 1
Hmm... I deleted the deployment and re-deployed. It now looks like everything is right (work pool and storage) but the new run is still not being picked up. It's about 5 minutes late. Any ideas? 🧐
I haven't explicitly specified a
work_queue
in the deployment. It shows as
default
in the UI. But that shouldn't matter, right? It should run in the default work queue.
r

redsquare

03/13/2023, 1:41 PM
I specify the queue even if there is only a single queue
r

Richard Alexander

03/13/2023, 1:52 PM
I specified the work queue as
default
, but still no luck. I'll try creating a work queue instead of using default.
I have tried several things, but cannot get the agent to pick up work: • Created a new
main
work queue and re-deployed without changing the agent • Re-deployed with
work_queue_name = 'default'
, but not changing the agent • Changed the agent to explicitly poll from the
default
work queue • Changed the agent to explicitly poll from the
main
work queue One interesting note, when I explicitly set the agents work queue to
default
the log gave some output/error messages about canceled flows that I had already deleted. But it still didn't pick up the new flow run. Related question: If we start an agent in a work pool, but don't specify the work queue, will it pick up work from all queues? Or no queues? I'm not sure what else to try at this point. Any other suggestions?
r

redsquare

03/13/2023, 2:26 PM
how are you starting your agents
r

Richard Alexander

03/13/2023, 2:27 PM
command line
r

redsquare

03/13/2023, 2:27 PM
assume you have the correct env vars set and are connecting to a queue that exists
r

Richard Alexander

03/13/2023, 2:27 PM
For example:
prefect agent start -p worker -q main
r

redsquare

03/13/2023, 2:27 PM
can you list deployments from the command line
prefect deployments ls
r

Richard Alexander

03/13/2023, 2:29 PM
Yes I do. As I mentioned in the original question, one of the deployments on this machine is working (from a different pool) but pulling from the same storage. So I know that all of the environmental variables are correct.
r

redsquare

03/13/2023, 2:29 PM
ah I forgot that, long time ago 😁
r

Richard Alexander

03/13/2023, 2:30 PM
image.png
lots more deployments, but the command works
(No worries, that was a long time ago šŸ˜‰)
And thanks for the help so far, by the way! This one really has me stumped...
r

redsquare

03/13/2023, 2:41 PM
no problem - tbh I run an agent per pod in k8s to isolate and avoid crossing streams
what do you see in the ui when you click the work pool -> queues
r

Richard Alexander

03/13/2023, 2:45 PM
I see both the
default
and the new
main
queues. The main queue has
1 late run
as well since that is the last test I tried.
r

redsquare

03/13/2023, 2:46 PM
are the queues listed as healthy - meaning an agent has polled recently
r

Richard Alexander

03/13/2023, 2:48 PM
No. But I haven't been paying attention to that... since I upgraded to 2.8.0, all agents have been listed as unhealthy, even for pools and queues that are working.
r

redsquare

03/13/2023, 2:50 PM
prefect config set PREFECT_LOGGING_LEVEL=DEBUG
up the logging level and see if it gives any noise
can you move to latest 2.8 4
r

Richard Alexander

03/13/2023, 2:52 PM
I have two servers running, one main server and the other is for specific "worker" agents. Should I increase the log level on one or both?
(main server runs prefect UI)
r

redsquare

03/13/2023, 2:53 PM
i'd try the agent first to see if that gives any clues about comms to the server
r

Richard Alexander

03/13/2023, 2:54 PM
Got it. I'm about to hop on a call, but will report back in a bit :-)
r

redsquare

03/13/2023, 2:54 PM
kk
r

Richard Alexander

03/14/2023, 12:37 PM
I followed your suggestions @redsquare: • I upgraded to 2.8.5 • I set logging level to debug Debug shows no extra info on the agent side. They output their initial log entry then sit silently. Also no change regarding the health status of my work queues. All are listed as
unhealthy
, even those that are working properly. What should we try next?
r

redsquare

03/14/2023, 12:41 PM
Is this with a single agent started up too?
r

Richard Alexander

03/14/2023, 12:50 PM
No. I can disable the other agents on this server... one moment
No luck. I only have one agent on that server that is polling, but it's not picking up the flows in the queue.
Is there some other way to see which agents are polling the database since the logs aren't helping? Is there some kind of
last-polled
column in the database somewhere that I can check?
(We have a Postgres database set up)
Also, a question about the logging. I started the agent via this command:
prefect agent start -p worker -q default
And get this output from the log:
Agent started! Looking for work from work pool 'worker'...
The log mentions the pool, but it doesn't say anything about the queue. Is that to be expected? Or does that indicate a problem?
r

Richard Alexander

03/14/2023, 2:34 PM
Hmmm... Maybe the debug setting got overwritten when I upgraded. I'll try that again.
r

redsquare

03/14/2023, 2:34 PM
queues feed the pool, the queue is optional
r

Richard Alexander

03/14/2023, 2:46 PM
Still no logs. OK, let's back up. Here's the setup: • Server MAIN has prefect UI and agents • Server WORKER has only agents •
prefect server
is started on MAIN, not WORKER I assumed that agents can be started and poll for work from the database by themselves. Is that correct or not?
r

redsquare

03/14/2023, 2:53 PM
yes correct
you sure the agent url is correct
r

Richard Alexander

03/14/2023, 3:15 PM
I set the
PREFECT_API_DATABASE_CONNECTION_URL
. I assumed that agents poll the database directly. Is that not correct? I know that flows have run on the WORKER server... I watched the processes get picked up with
htop
and almost completely fill the available memory.
r

redsquare

03/14/2023, 3:17 PM
no the agents hit the api
r

Richard Alexander

03/14/2023, 3:21 PM
Hmm... that's crazy. I wonder if I DID have prefect server started on the WORKER machine until I rebooted. Out of curiosity, what kind of issues would that cause? Having two servers, both with
prefect server
running and connected to the same database. Any chance that I did harm to the system?
r

redsquare

03/14/2023, 3:23 PM
Will be fine otherwise you could never scale the solution!
I doubt prefect cloud runs on a single server node