Quick question about `agents` `work pools` and `storage bloc Prefect Community #ask-community

Quick question about `agents`, `work_pools` and `s...

Richard Alexander

03/13/2023, 12:48 PM

Quick question about

agents

work_pools

and

storage_blocks

. I have several agents (each polling a different work pool) on the same server trying to poll jobs from the same

s3

storage block, but only one of them is working. Can multiple agents/work pools be connected to the same storage block?

redsquare

03/13/2023, 12:55 PM

How do you mean poll jobs from s3, as in prefect jobs or your internal jobs thus your using s3 like a queue?

Richard Alexander

03/13/2023, 1:02 PM

I created an

s3 storage block

. Can multiple agents / work pools use the same storage block? Or, do you have ideas on how to troubleshoot why a Work Pool agent isn't working? The two that aren't working say that they are polling for work, but their jobs are sitting in a

late

status, untouched by the agents that should be picking them up.

redsquare

03/13/2023, 1:08 PM

so if I read this right your using s3 as storage for deployment, that should not impact flows being started, once started the flow is downloaded from s3, any logs from the agent?

Richard Alexander

03/13/2023, 1:11 PM

No logs from the agent. I have restarted each, and the log stays static on the first entry "Polling for work from work pool 'worker'".

Richard Alexander

03/13/2023, 1:12 PM

And yes... I'm using S3 as storage for the deployment. Sorry for my inaccurate wording 😉

😁 1

redsquare

03/13/2023, 1:12 PM

what happens if you invoke a 'run' from the UI

Richard Alexander

03/13/2023, 1:12 PM

I'll try...

redsquare

03/13/2023, 1:14 PM

I would double check the agent+pool defined on the deployment too

Richard Alexander

03/13/2023, 1:14 PM

OK, just started it via "Quick Run" in the UI. I expect that it will sit here in a late status as well, but I'll let you know in a few minutes.

Richard Alexander

03/13/2023, 1:15 PM

Right, I'll do that again right now as well

redsquare

03/13/2023, 1:15 PM

Richard Alexander

03/13/2023, 1:21 PM

It's still sitting in a late status. Ah... the work pool is correct, but it's pointing to an old storage that no longer exists. I tried re-deploying, but it still didn't pick up the new storage. I'll try deleting the deployment all together and see if I can get it to start....

👍 1

Richard Alexander

03/13/2023, 1:33 PM

Hmm... I deleted the deployment and re-deployed. It now looks like everything is right (work pool and storage) but the new run is still not being picked up. It's about 5 minutes late. Any ideas? 🧐

Richard Alexander

03/13/2023, 1:36 PM

I haven't explicitly specified a

work_queue

in the deployment. It shows as

default

in the UI. But that shouldn't matter, right? It should run in the default work queue.

redsquare

03/13/2023, 1:41 PM

I specify the queue even if there is only a single queue

Richard Alexander

03/13/2023, 1:52 PM

I specified the work queue as

default

, but still no luck. I'll try creating a work queue instead of using default.

Richard Alexander

03/13/2023, 2:21 PM

I have tried several things, but cannot get the agent to pick up work: • Created a new

main

work queue and re-deployed without changing the agent • Re-deployed with

work_queue_name = 'default'

, but not changing the agent • Changed the agent to explicitly poll from the

default

work queue • Changed the agent to explicitly poll from the

main

work queue One interesting note, when I explicitly set the agents work queue to

default

the log gave some output/error messages about canceled flows that I had already deleted. But it still didn't pick up the new flow run. Related question: If we start an agent in a work pool, but don't specify the work queue, will it pick up work from all queues? Or no queues? I'm not sure what else to try at this point. Any other suggestions?

redsquare

03/13/2023, 2:26 PM

how are you starting your agents

Richard Alexander

03/13/2023, 2:27 PM

command line

redsquare

03/13/2023, 2:27 PM

assume you have the correct env vars set and are connecting to a queue that exists

Richard Alexander

03/13/2023, 2:27 PM

For example:

prefect agent start -p worker -q main

redsquare

03/13/2023, 2:27 PM

can you list deployments from the command line

redsquare

03/13/2023, 2:28 PM

prefect deployments ls

Richard Alexander

03/13/2023, 2:29 PM

Yes I do. As I mentioned in the original question, one of the deployments on this machine is working (from a different pool) but pulling from the same storage. So I know that all of the environmental variables are correct.

redsquare

03/13/2023, 2:29 PM

ah I forgot that, long time ago 😁

Richard Alexander

03/13/2023, 2:30 PM

Richard Alexander

03/13/2023, 2:30 PM

lots more deployments, but the command works

Richard Alexander

03/13/2023, 2:31 PM

(No worries, that was a long time ago 😉)

Richard Alexander

03/13/2023, 2:33 PM

And thanks for the help so far, by the way! This one really has me stumped...

redsquare

03/13/2023, 2:41 PM

no problem - tbh I run an agent per pod in k8s to isolate and avoid crossing streams

redsquare

03/13/2023, 2:44 PM

what do you see in the ui when you click the work pool -> queues

Richard Alexander

03/13/2023, 2:45 PM

I see both the

default

and the new

main

queues. The main queue has

1 late run

as well since that is the last test I tried.

redsquare

03/13/2023, 2:46 PM

are the queues listed as healthy - meaning an agent has polled recently

Richard Alexander

03/13/2023, 2:48 PM

No. But I haven't been paying attention to that... since I upgraded to 2.8.0, all agents have been listed as unhealthy, even for pools and queues that are working.

redsquare

03/13/2023, 2:50 PM

prefect config set PREFECT_LOGGING_LEVEL=DEBUG

redsquare

03/13/2023, 2:50 PM

up the logging level and see if it gives any noise

redsquare

03/13/2023, 2:51 PM

can you move to latest 2.8 4

Richard Alexander

03/13/2023, 2:52 PM

I have two servers running, one main server and the other is for specific "worker" agents. Should I increase the log level on one or both?

Richard Alexander

03/13/2023, 2:52 PM

(main server runs prefect UI)

redsquare

03/13/2023, 2:53 PM

i'd try the agent first to see if that gives any clues about comms to the server

Richard Alexander

03/13/2023, 2:54 PM

Got it. I'm about to hop on a call, but will report back in a bit :-)

redsquare

03/13/2023, 2:54 PM

Richard Alexander

03/14/2023, 12:37 PM

I followed your suggestions @redsquare: • I upgraded to 2.8.5 • I set logging level to debug Debug shows no extra info on the agent side. They output their initial log entry then sit silently. Also no change regarding the health status of my work queues. All are listed as

unhealthy

, even those that are working properly. What should we try next?

redsquare

03/14/2023, 12:41 PM

Is this with a single agent started up too?

Richard Alexander

03/14/2023, 12:50 PM

No. I can disable the other agents on this server... one moment

Richard Alexander

03/14/2023, 2:20 PM

No luck. I only have one agent on that server that is polling, but it's not picking up the flows in the queue.

Richard Alexander

03/14/2023, 2:24 PM

Is there some other way to see which agents are polling the database since the logs aren't helping? Is there some kind of

last-polled

column in the database somewhere that I can check?

Richard Alexander

03/14/2023, 2:25 PM

(We have a Postgres database set up)

Richard Alexander

03/14/2023, 2:29 PM

Also, a question about the logging. I started the agent via this command:

prefect agent start -p worker -q default

And get this output from the log:

Agent started! Looking for work from work pool 'worker'...

The log mentions the pool, but it doesn't say anything about the queue. Is that to be expected? Or does that indicate a problem?

redsquare

03/14/2023, 2:32 PM

it should be logging out on each poll https://github.com/PrefectHQ/prefect/blob/23a643d7f6ce788c1651a29dbe69ebe05d2bb033/src/prefect/agent.py#L185

Richard Alexander

03/14/2023, 2:34 PM

Hmmm... Maybe the debug setting got overwritten when I upgraded. I'll try that again.

redsquare

03/14/2023, 2:34 PM

queues feed the pool, the queue is optional

Richard Alexander

03/14/2023, 2:46 PM

Still no logs. OK, let's back up. Here's the setup: • Server MAIN has prefect UI and agents • Server WORKER has only agents •

prefect server

is started on MAIN, not WORKER I assumed that agents can be started and poll for work from the database by themselves. Is that correct or not?

redsquare

03/14/2023, 2:53 PM

yes correct

redsquare

03/14/2023, 2:54 PM

you sure the agent url is correct

Richard Alexander

03/14/2023, 3:15 PM

I set the

PREFECT_API_DATABASE_CONNECTION_URL

. I assumed that agents poll the database directly. Is that not correct? I know that flows have run on the WORKER server... I watched the processes get picked up with

htop

and almost completely fill the available memory.

redsquare

03/14/2023, 3:17 PM

no the agents hit the api

Richard Alexander

03/14/2023, 3:21 PM

Hmm... that's crazy. I wonder if I DID have prefect server started on the WORKER machine until I rebooted. Out of curiosity, what kind of issues would that cause? Having two servers, both with

prefect server

running and connected to the same database. Any chance that I did harm to the system?

redsquare

03/14/2023, 3:23 PM

Will be fine otherwise you could never scale the solution!

redsquare

03/14/2023, 3:23 PM

I doubt prefect cloud runs on a single server node

Open in Slack

Previous Next