https://prefect.io logo
w

William Grim

02/04/2022, 8:41 PM
Has anyone seen issues like this?
Copy code
$ prefect run -n example --watch
Looking up flow metadata...Done
Creating run for flow 'example'...Done
└── Name: eccentric-groundhog
└── UUID: bc79f3ec-5525-4be6-b327-375624abb387
└── Labels: ['caretaker', 'input', 'output', 'prefect-agent-556bd57fdf-v74zj']
└── Parameters: {}
└── Context: {}
└── URL: <http://localhost:8080/default/flow-run/bc79f3ec-5525-4be6-b327-375624abb387>
Watching flow run execution...
└── 20:33:26 | INFO    | Entered state <Scheduled>: Flow run scheduled.
── 20:33:43 | WARNING | It has been 15 seconds and your flow run has not been submitted by an agent. Agent 93e9ff4d-5fce-4b1d-ad1b-59925fd32f92 (agent) has matching labels and last queried a few seconds ago. It should deploy your flow run.
└── 20:34:16 | WARNING | It has been 50 seconds and your flow run has not been submitted by an agent. Agent 93e9ff4d-5fce-4b1d-ad1b-59925fd32f92 (agent) has matching labels and last queried a few seconds ago. It should deploy your flow run.
No agent is picking up any of our flows, and flow runs just stay in the "scheduled" state even though on a CLI run, it states that there is an agent with matching labels.
k

Kevin Kho

02/04/2022, 8:44 PM
It seems like there is an agent but it’s not healthy to pick things up. Would you be able to check that agent?
a

Anna Geller

02/04/2022, 8:45 PM
You would need to have an agent that has all those labels that you specified on your flow. Or the other way around: the labels on the agent must be a superset of those set on the run_config. Alternatively, you can run it agentless using --execute:
Copy code
prefect run -n example --execute --watch
w

William Grim

02/04/2022, 8:46 PM
Yeah, the
example
flow has an
input
label, and the agents have that label and more.
a

Anna Geller

02/04/2022, 8:48 PM
it looks like your flow has more labels that just input. It has: [‘caretaker’, ‘input’, ‘output’, ‘prefect-agent-556bd57fdf-v74zj’] and all those labels would need to be also set on the agent
w

William Grim

02/04/2022, 8:49 PM
You're right, I thought it just had input, but it only has the first 3. The last label must be auto-added somewhere.
So our agents have all those labels. I think the "hostname" label is auto-added to all local prefect agents?
a

Anna Geller

02/04/2022, 8:50 PM
Yes exactly, it’s added by the registration process when you use local storage.
You could not include it using:
Copy code
flow = Flow("local-flow", storage=Local(add_default_labels=False))
and on the agent:
Copy code
prefect agent local start --no-hostname-label
w

William Grim

02/04/2022, 8:54 PM
Ohhh. I'm going to try that. That brings me to a different question: we haven't gotten to the point of having "kubernetes agents" yet, and we want to run X local agents. If we do what you just stated there, would any of the 5 agents pick up a flow?
a

Anna Geller

02/04/2022, 8:56 PM
If you really want to run multiple local agents, then you need to assign unique labels to each of those agents and to your flow runs to manually “load-balance” the flows across agents. This page explains the issue more https://docs.prefect.io/orchestration/agents/local.html#multiple-local-agents-with-the-same-label
w

William Grim

02/04/2022, 8:56 PM
Yeah, okay, I've read that.
👍 1
I have modified that example flow with your suggestion, and now it's saying things like
── 21:01:25 | WARNING | It has been 50 seconds and your flow run has not been submitted by an agent. Found 5 healthy agents with matching labels. One of them should pick up your flow.
I do understand that we aren't load-balancing properly, but the labels do match and one agent should pick this up. This is actually a bug we only recently started encountering, and there weren't any code changes that would seem related.
Is there another way to force an agent to start picking up flow runs?
a

Anna Geller

02/04/2022, 9:04 PM
The only way to force it is to assign exactly the same labels on the agent and on the run config 🙂 can you share your exact storage + run_config and the command used to start the agent?
also the command you used to start the flow run
w

William Grim

02/04/2022, 9:08 PM
Yeah, the cmd I'm using on the agents are (and I know we aren't load balancing, but we're working on getting there 🙂 😞
/usr/local/bin/python /usr/local/bin/prefect agent local start -l company_name -l input -l output -l caretaker --show-flow-logs
The "example" flow's code is:
Copy code
with Flow(
    "example",
    state_handlers=[save_activity],
    run_config=LocalRun(labels=["input", "output", "caretaker"]),
    storage=storage.Local(add_default_labels=False),
) as flow:
I added the storage param after your input.
a

Anna Geller

02/04/2022, 9:16 PM
ok, so the problem is: if you don’t add the hostname label on the storage, then you also shouldn’t set it on the agent. So your agent command should be:
Copy code
prefect agent local start -l company_name -l input -l output -l caretaker --show-flow-logs --no-hostname-label
w

William Grim

02/04/2022, 9:17 PM
gotcha. but earlier, all labels were set in the flow and a superset was in the agent.
It's just very weird b/c up until a few days ago, this was all working. it could be environmental, but i can't figure it out without something in agent logs or something saying why it won't pick up a flow run
a

Anna Geller

02/04/2022, 9:18 PM
does it work now?
w

William Grim

02/04/2022, 9:18 PM
no, it does not
a

Anna Geller

02/04/2022, 9:21 PM
agent:
Copy code
prefect agent local start -l company_name -l input -l output -l caretaker --show-flow-logs --no-hostname-label
flow:
Copy code
with Flow(
    "example",
    state_handlers=[save_activity],
    run_config=LocalRun(labels=["input", "output", "caretaker"]),
    storage=storage.Local(add_default_labels=False),
) as flow:
and you need to make sure that you register this flow from the same machine as the agent, otherwise this won’t work because you specified that your storage is local to the agent atm. If you run agents on a remote VM, you can explore other storage options like Git storage classes or cloud storage classes
🙏 1
w

William Grim

02/04/2022, 9:21 PM
i undid the no-hostname-label stuff
is there a way to see in a log somewhere the reason a flow won't be picked up? the logs coming out of
prefect run
say explicitly that an agent should pick up the flow
a

Anna Geller

02/04/2022, 9:23 PM
can you first explain your infrastructure? where is your agent process running? where are you registering your flow? does it all run on your laptop or on some VM?
w

William Grim

02/04/2022, 9:23 PM
k8s in aws
apollo, towel, db, agent, etc are in their own pods
a

Anna Geller

02/04/2022, 9:24 PM
you can’t use a local agent on Kubernetes, this won’t work
w

William Grim

02/04/2022, 9:24 PM
it does work
it's like a pod that's a VM
a

Anna Geller

02/04/2022, 9:25 PM
it will have side effects. I would encourage you to set up a KubernetesAgent if you are running this on Kubernetes.
w

William Grim

02/04/2022, 9:25 PM
ok. but are there some logs where i can see what's going on right now?
and i do agree with what you're saying, but this is already in prod and not something that can be hotfixed in that way
we've been running in this style setup for at least a year
a

Anna Geller

02/04/2022, 9:28 PM
you should be able to see it in the UI + in the relevant pods, and if you have access to your DB, you can also query it directly - e.g. query flow runs, you can also do the same from graphql query.
w

William Grim

02/04/2022, 9:29 PM
graphql i'm not too familiar with, unfortunately 😞 i only started hearing about it when i came onto our prefect codebase. but does it just store everything in the postgres backend somewhere? i do know how to work with that
a

Anna Geller

02/04/2022, 9:29 PM
yes - flow run table
w

William Grim

02/04/2022, 9:29 PM
oh, awesome! i'll start browsing in there and see what i can find. i'd also try the UI, but it's not easily accessible from prod
👍 1
a

Anna Geller

02/04/2022, 9:31 PM
good luck. sorry if I can’t be too helpful here but it’s hard to remotely debug such infrastructure issues, especially with a non-typical setup when local agent runs on Kubernetes 😅 hope you understand. I can try to help next week if you won’t figure it out until then
w

William Grim

02/04/2022, 9:32 PM
I understand. Try not to focus on the k8s stuff 🙂 it's just a pod that runs a local agent as the cmd. but hopefully we'll have it sorted very soon
👍 1
kubernetes agents are what i was working on, but been sidetracked with some stuff
a

Anna Geller

02/04/2022, 9:33 PM
it’s possible that it’s easier to just set up a new Kubernetes agent, set up new labels on your flow to match it with this K8s agent + reregister flows than to figure out how to debug this agent
w

William Grim

02/04/2022, 9:34 PM
if we had devops, then yes, perhaps 🙂
I'm not actually a real devops person
I just play one on tv
a

Anna Geller

02/04/2022, 9:34 PM
it’s 2 commands really:
Copy code
prefect agent kubernetes install > agent.yaml # adjust yaml and apply
kubectl apply -f agent.yaml
w

William Grim

02/04/2022, 9:35 PM
we've got an atypical k8 setup at that; hard to explain
anyway, i'm just not familiar enough to set that up right now
a

Anna Geller

02/04/2022, 9:35 PM
hard to help then 😄
w

William Grim

02/04/2022, 9:35 PM
word
@Anna Geller Following up. I did finally get our prefect ui tunneled properly. All the old agents were still registered, and we noticed a lot of stuff that needed to be cleaned up. Once we removed the dead agents and things, it all started working again. I don't know exactly what was happening, but we think it had something to do with that.
Now we can continue the work we were doing to get towards kubernetes agents. We have been aiming for that but have normal stuff that comes up. 🙂
a

Anna Geller

02/05/2022, 10:25 AM
Nice work! Keep us posted if you have any questions along the way while migrating to a Kubernetes agent
5 Views