Has anyone seen issues like this? ```$ prefect ru...
# ask-community
w
Has anyone seen issues like this?
Copy code
$ prefect run -n example --watch
Looking up flow metadata...Done
Creating run for flow 'example'...Done
└── Name: eccentric-groundhog
└── UUID: bc79f3ec-5525-4be6-b327-375624abb387
└── Labels: ['caretaker', 'input', 'output', 'prefect-agent-556bd57fdf-v74zj']
└── Parameters: {}
└── Context: {}
└── URL: <http://localhost:8080/default/flow-run/bc79f3ec-5525-4be6-b327-375624abb387>
Watching flow run execution...
└── 20:33:26 | INFO    | Entered state <Scheduled>: Flow run scheduled.
── 20:33:43 | WARNING | It has been 15 seconds and your flow run has not been submitted by an agent. Agent 93e9ff4d-5fce-4b1d-ad1b-59925fd32f92 (agent) has matching labels and last queried a few seconds ago. It should deploy your flow run.
└── 20:34:16 | WARNING | It has been 50 seconds and your flow run has not been submitted by an agent. Agent 93e9ff4d-5fce-4b1d-ad1b-59925fd32f92 (agent) has matching labels and last queried a few seconds ago. It should deploy your flow run.
No agent is picking up any of our flows, and flow runs just stay in the "scheduled" state even though on a CLI run, it states that there is an agent with matching labels.
k
It seems like there is an agent but it’s not healthy to pick things up. Would you be able to check that agent?
a
You would need to have an agent that has all those labels that you specified on your flow. Or the other way around: the labels on the agent must be a superset of those set on the run_config. Alternatively, you can run it agentless using --execute:
Copy code
prefect run -n example --execute --watch
w
Yeah, the
example
flow has an
input
label, and the agents have that label and more.
a
it looks like your flow has more labels that just input. It has: [‘caretaker’, ‘input’, ‘output’, ‘prefect-agent-556bd57fdf-v74zj’] and all those labels would need to be also set on the agent
w
You're right, I thought it just had input, but it only has the first 3. The last label must be auto-added somewhere.
So our agents have all those labels. I think the "hostname" label is auto-added to all local prefect agents?
a
Yes exactly, it’s added by the registration process when you use local storage.
You could not include it using:
Copy code
flow = Flow("local-flow", storage=Local(add_default_labels=False))
and on the agent:
Copy code
prefect agent local start --no-hostname-label
w
Ohhh. I'm going to try that. That brings me to a different question: we haven't gotten to the point of having "kubernetes agents" yet, and we want to run X local agents. If we do what you just stated there, would any of the 5 agents pick up a flow?
a
If you really want to run multiple local agents, then you need to assign unique labels to each of those agents and to your flow runs to manually “load-balance” the flows across agents. This page explains the issue more https://docs.prefect.io/orchestration/agents/local.html#multiple-local-agents-with-the-same-label
w
Yeah, okay, I've read that.
👍 1
I have modified that example flow with your suggestion, and now it's saying things like
── 21:01:25 | WARNING | It has been 50 seconds and your flow run has not been submitted by an agent. Found 5 healthy agents with matching labels. One of them should pick up your flow.
I do understand that we aren't load-balancing properly, but the labels do match and one agent should pick this up. This is actually a bug we only recently started encountering, and there weren't any code changes that would seem related.
Is there another way to force an agent to start picking up flow runs?
a
The only way to force it is to assign exactly the same labels on the agent and on the run config 🙂 can you share your exact storage + run_config and the command used to start the agent?
also the command you used to start the flow run
w
Yeah, the cmd I'm using on the agents are (and I know we aren't load balancing, but we're working on getting there 🙂 😞
/usr/local/bin/python /usr/local/bin/prefect agent local start -l company_name -l input -l output -l caretaker --show-flow-logs
The "example" flow's code is:
Copy code
with Flow(
    "example",
    state_handlers=[save_activity],
    run_config=LocalRun(labels=["input", "output", "caretaker"]),
    storage=storage.Local(add_default_labels=False),
) as flow:
I added the storage param after your input.
a
ok, so the problem is: if you don’t add the hostname label on the storage, then you also shouldn’t set it on the agent. So your agent command should be:
Copy code
prefect agent local start -l company_name -l input -l output -l caretaker --show-flow-logs --no-hostname-label
w
gotcha. but earlier, all labels were set in the flow and a superset was in the agent.
It's just very weird b/c up until a few days ago, this was all working. it could be environmental, but i can't figure it out without something in agent logs or something saying why it won't pick up a flow run
a
does it work now?
w
no, it does not
a
agent:
Copy code
prefect agent local start -l company_name -l input -l output -l caretaker --show-flow-logs --no-hostname-label
flow:
Copy code
with Flow(
    "example",
    state_handlers=[save_activity],
    run_config=LocalRun(labels=["input", "output", "caretaker"]),
    storage=storage.Local(add_default_labels=False),
) as flow:
and you need to make sure that you register this flow from the same machine as the agent, otherwise this won’t work because you specified that your storage is local to the agent atm. If you run agents on a remote VM, you can explore other storage options like Git storage classes or cloud storage classes
🙏 1
w
i undid the no-hostname-label stuff
is there a way to see in a log somewhere the reason a flow won't be picked up? the logs coming out of
prefect run
say explicitly that an agent should pick up the flow
a
can you first explain your infrastructure? where is your agent process running? where are you registering your flow? does it all run on your laptop or on some VM?
w
k8s in aws
apollo, towel, db, agent, etc are in their own pods
a
you can’t use a local agent on Kubernetes, this won’t work
w
it does work
it's like a pod that's a VM
a
it will have side effects. I would encourage you to set up a KubernetesAgent if you are running this on Kubernetes.
w
ok. but are there some logs where i can see what's going on right now?
and i do agree with what you're saying, but this is already in prod and not something that can be hotfixed in that way
we've been running in this style setup for at least a year
a
you should be able to see it in the UI + in the relevant pods, and if you have access to your DB, you can also query it directly - e.g. query flow runs, you can also do the same from graphql query.
w
graphql i'm not too familiar with, unfortunately 😞 i only started hearing about it when i came onto our prefect codebase. but does it just store everything in the postgres backend somewhere? i do know how to work with that
a
yes - flow run table
w
oh, awesome! i'll start browsing in there and see what i can find. i'd also try the UI, but it's not easily accessible from prod
👍 1
a
good luck. sorry if I can’t be too helpful here but it’s hard to remotely debug such infrastructure issues, especially with a non-typical setup when local agent runs on Kubernetes 😅 hope you understand. I can try to help next week if you won’t figure it out until then
w
I understand. Try not to focus on the k8s stuff 🙂 it's just a pod that runs a local agent as the cmd. but hopefully we'll have it sorted very soon
👍 1
kubernetes agents are what i was working on, but been sidetracked with some stuff
a
it’s possible that it’s easier to just set up a new Kubernetes agent, set up new labels on your flow to match it with this K8s agent + reregister flows than to figure out how to debug this agent
w
if we had devops, then yes, perhaps 🙂
I'm not actually a real devops person
I just play one on tv
a
it’s 2 commands really:
Copy code
prefect agent kubernetes install > agent.yaml # adjust yaml and apply
kubectl apply -f agent.yaml
w
we've got an atypical k8 setup at that; hard to explain
anyway, i'm just not familiar enough to set that up right now
a
hard to help then 😄
w
word
@Anna Geller Following up. I did finally get our prefect ui tunneled properly. All the old agents were still registered, and we noticed a lot of stuff that needed to be cleaned up. Once we removed the dead agents and things, it all started working again. I don't know exactly what was happening, but we think it had something to do with that.
Now we can continue the work we were doing to get towards kubernetes agents. We have been aiming for that but have normal stuff that comes up. 🙂
a
Nice work! Keep us posted if you have any questions along the way while migrating to a Kubernetes agent