Thread
#prefect-community
    r

    Richard Hughes

    11 months ago
    Good Morning - I am experiencing an outage on my end w/ self hosted agents something is not allowing my agents to pickup flows - is anyone able to help me - are ports 8080 and 4200 the firewall rules I should check - not sure where to begin
    Kevin Kho

    Kevin Kho

    11 months ago
    Are your agents visible in the UI? When you check the Agents tab?
    r

    Richard Hughes

    11 months ago
    yes
    my agents are look like the are responding to queries however the agents haven'
    Kevin Kho

    Kevin Kho

    11 months ago
    So you have flows stuck in the Scheduled state?
    r

    Richard Hughes

    11 months ago
    t picked up any flows since friday around 6:30 pm CT
    the flows are all late
    Kevin Kho

    Kevin Kho

    11 months ago
    So it picked up flows before and then just stopped all of a sudden? How many total late flow runs do you have?
    r

    Richard Hughes

    11 months ago
    yes - all of sudden stopped working
    was there something in the last release that could of caused our agents to stop working?
    Kevin Kho

    Kevin Kho

    11 months ago
    Not really, but at 725 total late flows, I think the scheduler will stop working. Could you try deleting some flow runs to go down to like 500?
    Anna Geller

    Anna Geller

    11 months ago
    @Richard Hughes did you upgrade Prefect version on your agents? if so, from which to which version did you migrate? Also: can you confirm that labels are matching between flows and agents? Normally, if flow runs are stuck in Scheduled State (causing late runs) it’s usually because of some label misconfiguration
    r

    Richard Hughes

    11 months ago
    I have not upgraded my agents - i was wondering if you have upgraded the cloud service that could of caused my agent on premise to stop picking up flows
    Anna Geller

    Anna Geller

    11 months ago
    Do you use Prefect Server or Prefect Cloud?
    r

    Richard Hughes

    11 months ago
    cloud
    Anna Geller

    Anna Geller

    11 months ago
    @Richard Hughes can you access agent logs? If so, can you see anything suspicious there? Alternatively, could you start a new agent by including
    --show-flow-logs
    option so that we can see all logs on the agent for debugging?
    r

    Richard Hughes

    11 months ago
    yes i can see logs
    2021-10-25 13:31:07.323: [2021-10-25 13:31:07,323] INFO - agent | Waiting for flow runs...
    Anna Geller

    Anna Geller

    11 months ago
    Can you show what labels are configured on that agent vs. what labels are configured on the flows stuck in a Scheduled state? Are those labels matching 100%?
    since you mentioned you use Prefect Cloud, could you check whether all your API keys are still valid? if some API key expired, then flows get stuck in a Scheduled state as well.
    r

    Richard Hughes

    11 months ago
    give me just a minute
    looking into some of these items
    api tokens - show deprecated - we are using these and they show no expiration
    Anna Geller

    Anna Geller

    11 months ago
    don’t worry about the deprecation, it’s just a warning for now. But if you happen to restart any agents, it would be great to switch to using API keys instead of RUNNER tokens. There’s more about this in this blog
    r

    Richard Hughes

    11 months ago
    i started new agent w/ an extra label - but, both flows and agents have same labels example: "PROD" vs. "PROD"
    Anna Geller

    Anna Geller

    11 months ago
    how do you start your agent? Unless you include a flag
    --no-hostname-label
    , the agent also gets a hostname as label, and it’s likely that this label is missing on your flow. So your flow should have both prod, and the hostname label attached, unless you disabled this default label.
    r

    Richard Hughes

    11 months ago
    was this a recent change?
    our flows have the hostname labels
    Anna Geller

    Anna Geller

    11 months ago
    afaik this has been the default behavior in local agents for a while, so it’s not a recent change
    r

    Richard Hughes

    11 months ago
    prefect agent start -t "{API_TOKEN}" -l "PROD"
    Anna Geller

    Anna Geller

    11 months ago
    Thanks! And what are the labels on your flow?
    r

    Richard Hughes

    11 months ago
    the hostname
    Anna Geller

    Anna Geller

    11 months ago
    it should be both: the hostname and “PROD”
    because your agent has both labels, so the flow should also have the same labels
    r

    Richard Hughes

    11 months ago
    we don't have "PROD" label on the flows
    Anna Geller

    Anna Geller

    11 months ago
    I think this might be the issue
    r

    Richard Hughes

    11 months ago
    we have been running this way for a couple of years
    Anna Geller

    Anna Geller

    11 months ago
    could you try adding the hostname label to your flow and see whether this solves the issue?
    with Flow(
        FLOW_NAME,
        storage=STORAGE,
        run_config=LocalRun(labels=["PROD", "your-hostname"]),
    ) as flow:
    r

    Richard Hughes

    11 months ago
    we source control our flows and pipelines deploy to all flows from the agent machines
    we haven't changed anything on our side
    Anna Geller

    Anna Geller

    11 months ago
    Sorry to hear that you have an issue with this. As far as I can tell, in the current Prefect version when I don’t disable the hostname label, and my flow doesn’t have this hostname label attached, then it won’t be picked up by the agent. Can you share which Prefect version you were using so that I can cross check if anything changed since this version?
    and I would encourage you to give it a try with the exact label configuration just to see whether it helps
    r

    Richard Hughes

    11 months ago
    something just happened - we are running flows all of a sudden
    525 flows just kicked off all of a sudden
    all we did was clear the late runs
    Kevin Kho

    Kevin Kho

    11 months ago
    Oh ok so it was not a labelling issue. Yeah we didn’t change anything on that front. Just in 0.14.21 .
    r

    Richard Hughes

    11 months ago
    i think we are still on 13.9 or somewhere back - we need to upgrade - it requires us to adjust a template
    Kevin Kho

    Kevin Kho

    11 months ago
    Oh no worries about that. The 725 limit is across all versions anyway so it wouldn’t have helped in this scenario
    r

    Richard Hughes

    11 months ago
    ive clear all the late flow and stopped all the flows - still have something weird going on
    Kevin Kho

    Kevin Kho

    11 months ago
    Is it the same issue where stuff if not being picked up?
    r

    Richard Hughes

    11 months ago
    it seems like it was a bug in the concurrency limit - it said there was 45 flows running but, none running - removed this and re-added it and now we are running
    Anna Geller

    Anna Geller

    11 months ago
    Nice work! And thanks for letting us know about what was the issue