Quick question about `agents`, `work_pools` and `s...
# ask-community
r
Quick question about
agents
,
work_pools
and
storage_blocks
. I have several agents (each polling a different work pool) on the same server trying to poll jobs from the same
s3
storage block, but only one of them is working. Can multiple agents/work pools be connected to the same storage block?
r
How do you mean poll jobs from s3, as in prefect jobs or your internal jobs thus your using s3 like a queue?
r
I created an
s3 storage block
. Can multiple agents / work pools use the same storage block? Or, do you have ideas on how to troubleshoot why a Work Pool agent isn't working? The two that aren't working say that they are polling for work, but their jobs are sitting in a
late
status, untouched by the agents that should be picking them up.
r
so if I read this right your using s3 as storage for deployment, that should not impact flows being started, once started the flow is downloaded from s3, any logs from the agent?
r
No logs from the agent. I have restarted each, and the log stays static on the first entry "Polling for work from work pool 'worker'".
And yes... I'm using S3 as storage for the deployment. Sorry for my inaccurate wording šŸ˜‰
😁 1
r
what happens if you invoke a 'run' from the UI
r
I'll try...
r
I would double check the agent+pool defined on the deployment too
r
OK, just started it via "Quick Run" in the UI. I expect that it will sit here in a late status as well, but I'll let you know in a few minutes.
Right, I'll do that again right now as well
r
r
It's still sitting in a late status. Ah... the work pool is correct, but it's pointing to an old storage that no longer exists. I tried re-deploying, but it still didn't pick up the new storage. I'll try deleting the deployment all together and see if I can get it to start....
šŸ‘ 1
Hmm... I deleted the deployment and re-deployed. It now looks like everything is right (work pool and storage) but the new run is still not being picked up. It's about 5 minutes late. Any ideas? 🧐
I haven't explicitly specified a
work_queue
in the deployment. It shows as
default
in the UI. But that shouldn't matter, right? It should run in the default work queue.
r
I specify the queue even if there is only a single queue
r
I specified the work queue as
default
, but still no luck. I'll try creating a work queue instead of using default.
I have tried several things, but cannot get the agent to pick up work: • Created a new
main
work queue and re-deployed without changing the agent • Re-deployed with
work_queue_name = 'default'
, but not changing the agent • Changed the agent to explicitly poll from the
default
work queue • Changed the agent to explicitly poll from the
main
work queue One interesting note, when I explicitly set the agents work queue to
default
the log gave some output/error messages about canceled flows that I had already deleted. But it still didn't pick up the new flow run. Related question: If we start an agent in a work pool, but don't specify the work queue, will it pick up work from all queues? Or no queues? I'm not sure what else to try at this point. Any other suggestions?
r
how are you starting your agents
r
command line
r
assume you have the correct env vars set and are connecting to a queue that exists
r
For example:
prefect agent start -p worker -q main
r
can you list deployments from the command line
prefect deployments ls
r
Yes I do. As I mentioned in the original question, one of the deployments on this machine is working (from a different pool) but pulling from the same storage. So I know that all of the environmental variables are correct.
r
ah I forgot that, long time ago 😁
r
lots more deployments, but the command works
(No worries, that was a long time ago šŸ˜‰)
And thanks for the help so far, by the way! This one really has me stumped...
r
no problem - tbh I run an agent per pod in k8s to isolate and avoid crossing streams
what do you see in the ui when you click the work pool -> queues
r
I see both the
default
and the new
main
queues. The main queue has
1 late run
as well since that is the last test I tried.
r
are the queues listed as healthy - meaning an agent has polled recently
r
No. But I haven't been paying attention to that... since I upgraded to 2.8.0, all agents have been listed as unhealthy, even for pools and queues that are working.
r
prefect config set PREFECT_LOGGING_LEVEL=DEBUG
up the logging level and see if it gives any noise
can you move to latest 2.8 4
r
I have two servers running, one main server and the other is for specific "worker" agents. Should I increase the log level on one or both?
(main server runs prefect UI)
r
i'd try the agent first to see if that gives any clues about comms to the server
r
Got it. I'm about to hop on a call, but will report back in a bit :-)
r
kk
r
I followed your suggestions @redsquare: • I upgraded to 2.8.5 • I set logging level to debug Debug shows no extra info on the agent side. They output their initial log entry then sit silently. Also no change regarding the health status of my work queues. All are listed as
unhealthy
, even those that are working properly. What should we try next?
r
Is this with a single agent started up too?
r
No. I can disable the other agents on this server... one moment
No luck. I only have one agent on that server that is polling, but it's not picking up the flows in the queue.
Is there some other way to see which agents are polling the database since the logs aren't helping? Is there some kind of
last-polled
column in the database somewhere that I can check?
(We have a Postgres database set up)
Also, a question about the logging. I started the agent via this command:
prefect agent start -p worker -q default
And get this output from the log:
Agent started! Looking for work from work pool 'worker'...
The log mentions the pool, but it doesn't say anything about the queue. Is that to be expected? Or does that indicate a problem?
r
Hmmm... Maybe the debug setting got overwritten when I upgraded. I'll try that again.
r
queues feed the pool, the queue is optional
r
Still no logs. OK, let's back up. Here's the setup: • Server MAIN has prefect UI and agents • Server WORKER has only agents •
prefect server
is started on MAIN, not WORKER I assumed that agents can be started and poll for work from the database by themselves. Is that correct or not?
r
yes correct
you sure the agent url is correct
r
I set the
PREFECT_API_DATABASE_CONNECTION_URL
. I assumed that agents poll the database directly. Is that not correct? I know that flows have run on the WORKER server... I watched the processes get picked up with
htop
and almost completely fill the available memory.
r
no the agents hit the api
r
Hmm... that's crazy. I wonder if I DID have prefect server started on the WORKER machine until I rebooted. Out of curiosity, what kind of issues would that cause? Having two servers, both with
prefect server
running and connected to the same database. Any chance that I did harm to the system?
r
Will be fine otherwise you could never scale the solution!
I doubt prefect cloud runs on a single server node