Has anyone seen issues like this ```$ prefect run n example Prefect Community #ask-community

Has anyone seen issues like this? ```$ prefect ru...

William Grim

02/04/2022, 8:41 PM

Has anyone seen issues like this?

Copy code

$ prefect run -n example --watch
Looking up flow metadata...Done
Creating run for flow 'example'...Done
└── Name: eccentric-groundhog
└── UUID: bc79f3ec-5525-4be6-b327-375624abb387
└── Labels: ['caretaker', 'input', 'output', 'prefect-agent-556bd57fdf-v74zj']
└── Parameters: {}
└── Context: {}
└── URL: <http://localhost:8080/default/flow-run/bc79f3ec-5525-4be6-b327-375624abb387>
Watching flow run execution...
└── 20:33:26 | INFO    | Entered state <Scheduled>: Flow run scheduled.
── 20:33:43 | WARNING | It has been 15 seconds and your flow run has not been submitted by an agent. Agent 93e9ff4d-5fce-4b1d-ad1b-59925fd32f92 (agent) has matching labels and last queried a few seconds ago. It should deploy your flow run.
└── 20:34:16 | WARNING | It has been 50 seconds and your flow run has not been submitted by an agent. Agent 93e9ff4d-5fce-4b1d-ad1b-59925fd32f92 (agent) has matching labels and last queried a few seconds ago. It should deploy your flow run.

No agent is picking up any of our flows, and flow runs just stay in the "scheduled" state even though on a CLI run, it states that there is an agent with matching labels.

Kevin Kho

02/04/2022, 8:44 PM

It seems like there is an agent but it’s not healthy to pick things up. Would you be able to check that agent?

Anna Geller

02/04/2022, 8:45 PM

You would need to have an agent that has all those labels that you specified on your flow. Or the other way around: the labels on the agent must be a superset of those set on the run_config. Alternatively, you can run it agentless using --execute:

Copy code

prefect run -n example --execute --watch

William Grim

02/04/2022, 8:46 PM

Yeah, the

example

flow has an

input

label, and the agents have that label and more.

Anna Geller

02/04/2022, 8:48 PM

it looks like your flow has more labels that just input. It has: [‘caretaker’, ‘input’, ‘output’, ‘prefect-agent-556bd57fdf-v74zj’] and all those labels would need to be also set on the agent

William Grim

02/04/2022, 8:49 PM

You're right, I thought it just had input, but it only has the first 3. The last label must be auto-added somewhere.

William Grim

02/04/2022, 8:50 PM

So our agents have all those labels. I think the "hostname" label is auto-added to all local prefect agents?

Anna Geller

02/04/2022, 8:50 PM

Yes exactly, it’s added by the registration process when you use local storage.

Anna Geller

02/04/2022, 8:52 PM

You could not include it using:

Copy code

flow = Flow("local-flow", storage=Local(add_default_labels=False))

and on the agent:

Copy code

prefect agent local start --no-hostname-label

William Grim

02/04/2022, 8:54 PM

Ohhh. I'm going to try that. That brings me to a different question: we haven't gotten to the point of having "kubernetes agents" yet, and we want to run X local agents. If we do what you just stated there, would any of the 5 agents pick up a flow?

Anna Geller

02/04/2022, 8:56 PM

If you really want to run multiple local agents, then you need to assign unique labels to each of those agents and to your flow runs to manually “load-balance” the flows across agents. This page explains the issue more https://docs.prefect.io/orchestration/agents/local.html#multiple-local-agents-with-the-same-label

William Grim

02/04/2022, 8:56 PM

Yeah, okay, I've read that.

👍 1

William Grim

02/04/2022, 9:01 PM

I have modified that example flow with your suggestion, and now it's saying things like

── 21:01:25 | WARNING | It has been 50 seconds and your flow run has not been submitted by an agent. Found 5 healthy agents with matching labels. One of them should pick up your flow.

William Grim

02/04/2022, 9:02 PM

I do understand that we aren't load-balancing properly, but the labels do match and one agent should pick this up. This is actually a bug we only recently started encountering, and there weren't any code changes that would seem related.

William Grim

02/04/2022, 9:02 PM

Is there another way to force an agent to start picking up flow runs?

Anna Geller

02/04/2022, 9:04 PM

The only way to force it is to assign exactly the same labels on the agent and on the run config 🙂 can you share your exact storage + run_config and the command used to start the agent?

Anna Geller

02/04/2022, 9:05 PM

also the command you used to start the flow run

William Grim

02/04/2022, 9:08 PM

Yeah, the cmd I'm using on the agents are (and I know we aren't load balancing, but we're working on getting there 🙂 😞

/usr/local/bin/python /usr/local/bin/prefect agent local start -l company_name -l input -l output -l caretaker --show-flow-logs

William Grim

02/04/2022, 9:09 PM

The "example" flow's code is:

Copy code

with Flow(
    "example",
    state_handlers=[save_activity],
    run_config=LocalRun(labels=["input", "output", "caretaker"]),
    storage=storage.Local(add_default_labels=False),
) as flow:

William Grim

02/04/2022, 9:09 PM

I added the storage param after your input.

Anna Geller

02/04/2022, 9:16 PM

ok, so the problem is: if you don’t add the hostname label on the storage, then you also shouldn’t set it on the agent. So your agent command should be:

Copy code

prefect agent local start -l company_name -l input -l output -l caretaker --show-flow-logs --no-hostname-label

William Grim

02/04/2022, 9:17 PM

gotcha. but earlier, all labels were set in the flow and a superset was in the agent.

William Grim

02/04/2022, 9:18 PM

It's just very weird b/c up until a few days ago, this was all working. it could be environmental, but i can't figure it out without something in agent logs or something saying why it won't pick up a flow run

Anna Geller

02/04/2022, 9:18 PM

does it work now?

William Grim

02/04/2022, 9:18 PM

no, it does not

Anna Geller

02/04/2022, 9:21 PM

agent:

Copy code

prefect agent local start -l company_name -l input -l output -l caretaker --show-flow-logs --no-hostname-label

flow:

Copy code

with Flow(
    "example",
    state_handlers=[save_activity],
    run_config=LocalRun(labels=["input", "output", "caretaker"]),
    storage=storage.Local(add_default_labels=False),
) as flow:

and you need to make sure that you register this flow from the same machine as the agent, otherwise this won’t work because you specified that your storage is local to the agent atm. If you run agents on a remote VM, you can explore other storage options like Git storage classes or cloud storage classes

🙏 1

William Grim

02/04/2022, 9:21 PM

i undid the no-hostname-label stuff

William Grim

02/04/2022, 9:22 PM

is there a way to see in a log somewhere the reason a flow won't be picked up? the logs coming out of

prefect run

say explicitly that an agent should pick up the flow

Anna Geller

02/04/2022, 9:23 PM

can you first explain your infrastructure? where is your agent process running? where are you registering your flow? does it all run on your laptop or on some VM?

William Grim

02/04/2022, 9:23 PM

k8s in aws

William Grim

02/04/2022, 9:23 PM

apollo, towel, db, agent, etc are in their own pods

Anna Geller

02/04/2022, 9:24 PM

you can’t use a local agent on Kubernetes, this won’t work

William Grim

02/04/2022, 9:24 PM

it does work

William Grim

02/04/2022, 9:24 PM

it's like a pod that's a VM

Anna Geller

02/04/2022, 9:25 PM

it will have side effects. I would encourage you to set up a KubernetesAgent if you are running this on Kubernetes.

William Grim

02/04/2022, 9:25 PM

ok. but are there some logs where i can see what's going on right now?

William Grim

02/04/2022, 9:27 PM

and i do agree with what you're saying, but this is already in prod and not something that can be hotfixed in that way

William Grim

02/04/2022, 9:27 PM

we've been running in this style setup for at least a year

Anna Geller

02/04/2022, 9:28 PM

you should be able to see it in the UI + in the relevant pods, and if you have access to your DB, you can also query it directly - e.g. query flow runs, you can also do the same from graphql query.

William Grim

02/04/2022, 9:29 PM

graphql i'm not too familiar with, unfortunately 😞 i only started hearing about it when i came onto our prefect codebase. but does it just store everything in the postgres backend somewhere? i do know how to work with that

Anna Geller

02/04/2022, 9:29 PM

yes - flow run table

William Grim

02/04/2022, 9:29 PM

oh, awesome! i'll start browsing in there and see what i can find. i'd also try the UI, but it's not easily accessible from prod

👍 1

Anna Geller

02/04/2022, 9:31 PM

good luck. sorry if I can’t be too helpful here but it’s hard to remotely debug such infrastructure issues, especially with a non-typical setup when local agent runs on Kubernetes 😅 hope you understand. I can try to help next week if you won’t figure it out until then

William Grim

02/04/2022, 9:32 PM

I understand. Try not to focus on the k8s stuff 🙂 it's just a pod that runs a local agent as the cmd. but hopefully we'll have it sorted very soon

👍 1

William Grim

02/04/2022, 9:32 PM

kubernetes agents are what i was working on, but been sidetracked with some stuff

Anna Geller

02/04/2022, 9:33 PM

it’s possible that it’s easier to just set up a new Kubernetes agent, set up new labels on your flow to match it with this K8s agent + reregister flows than to figure out how to debug this agent

William Grim

02/04/2022, 9:34 PM

if we had devops, then yes, perhaps 🙂

William Grim

02/04/2022, 9:34 PM

I'm not actually a real devops person

William Grim

02/04/2022, 9:34 PM

I just play one on tv

Anna Geller

02/04/2022, 9:34 PM

it’s 2 commands really:

Copy code

prefect agent kubernetes install > agent.yaml # adjust yaml and apply
kubectl apply -f agent.yaml

William Grim

02/04/2022, 9:35 PM

we've got an atypical k8 setup at that; hard to explain

William Grim

02/04/2022, 9:35 PM

anyway, i'm just not familiar enough to set that up right now

Anna Geller

02/04/2022, 9:35 PM

hard to help then 😄

William Grim

02/04/2022, 9:35 PM

word

William Grim

02/04/2022, 11:24 PM

@Anna Geller Following up. I did finally get our prefect ui tunneled properly. All the old agents were still registered, and we noticed a lot of stuff that needed to be cleaned up. Once we removed the dead agents and things, it all started working again. I don't know exactly what was happening, but we think it had something to do with that.

William Grim

02/04/2022, 11:25 PM

Now we can continue the work we were doing to get towards kubernetes agents. We have been aiming for that but have normal stuff that comes up. 🙂

Anna Geller

02/05/2022, 10:25 AM

Nice work! Keep us posted if you have any questions along the way while migrating to a Kubernetes agent

5 Views

Open in Slack

Previous Next