Hi Everyone I've just started trying out the Prefe...
# prefect-server
e
Hi Everyone I've just started trying out the Prefect Cloud and unfortunately I can't seem to get any tasks to run I have a simple task that looks like this:
Copy code
from datetime import timedelta

import prefect
from prefect import Flow, task
from prefect.schedules import IntervalSchedule

prefect.config.debug = True
prefect.config.logging.level = "DEBUG"

@task(name="orchestrate", log_stdout=True)
def orchestrate():
    logger = prefect.context["logger"]
    logger.info("Orchestrating")
    for i in range(1, 60):
        logger.error(f"Counting {i}")
    return True


flow = Flow("Cloud Flow", schedule=IntervalSchedule(
    interval=timedelta(minutes=1),
))
with flow:
    orchestrate()

flow.register(project_name="dev")
flow.run_agent(token="XXXX")
And I start the agent with:
Copy code
python example.py
Agent seems to come up fine but just hangs:
Copy code
(venv) ibraflow (master|✚5…) python example.py
Result check: OK
Flow: <https://cloud.prefect.io/ysft/flow/b7f56496-54df-4eeb-a0b6-2e274c39ed80>

 ____            __           _        _                    _
|  _ \ _ __ ___ / _| ___  ___| |_     / \   __ _  ___ _ __ | |_
| |_) | '__/ _ \ |_ / _ \/ __| __|   / _ \ / _` |/ _ \ '_ \| __|
|  __/| | |  __/  _|  __/ (__| |_   / ___ \ (_| |  __/ | | | |_
|_|   |_|  \___|_|  \___|\___|\__| /_/   \_\__, |\___|_| |_|\__|
                                           |___/

[2020-08-16 10:34:38,874] INFO - agent | Starting LocalAgent with labels ['MacBook-Pro-2.local', 'azure-flow-storage', 'gcs-flow-storage', 's3-flow-storage', 'github-flow-storage', 'webhook-flow-storage']
[2020-08-16 10:34:38,874] INFO - agent | Agent documentation can be found at <https://docs.prefect.io/orchestration/>
[2020-08-16 10:34:38,875] INFO - agent | Agent connecting to the Prefect API at <https://api.prefect.io>
[2020-08-16 10:34:39,009] INFO - agent | Waiting for flow runs...
On the cloud's UI I see the correct version and task, so that rules out authentication and network issues? But the tasks never run and are always marked "late" - attachment:
n
Hi @Elie Hamouche - do you see your agent registered on the Dashboard screen of the UI? It'll look something like this:
e
Hi @nicholas Yes I see only one agent registered - however I have 3 instances running Also further update, I left it for a while and it looks like some jobs have been executed - but it's just executing very late
Screenshot of agent logs indicating that jobs were submitted and ran (there's 3 running simultaneously)
j
Hi @Elie Hamouche, I’m sorry for the trouble - I’ve been looking into this on the Cloud side for you. I can’t yet explain why you experienced this behavior, but I can explain what happened - we use a heuristic to avoid releasing work to agents if the agents already have a high number of
SUBMITTED
(or
RUNNING
) runs. This is to solve a situation we encountered in the early days of Cloud where an Agent would run into trouble (for example being unable to launch K8s jobs) and would end up spamming the cluster with unrunnable jobs. Therefore, if agents have high submitted counts, we don’t deliver work until they free up. In your case, Cloud was consistently behaving as if you had multiple
SUBMITTED
runs, when of course you don’t. This value is cached, but only for a short while, after which time it should refresh from the database.Therefore I’m still unable to explain why it persisted for so long, but I’ve manually cleared it and if you see the problem again, please DM me so I can try to identify it directly.
e
So I deleted the flow from the UI, renamed it and started 3 agents and I get the delays again (screenshot attached) https://cloud.prefect.io/ysft/flow/7fa63495-1a61-4052-9f0e-0a273cb0bad1
j
Elie, thanks so much for posting this - I was able to observe the situation! It may be a bug in the queueing logic, I’m opening a ticket to resolve ASAP
e
Awesome, glad I could help It seems to be working fine now the late runs have all cleared up
👍 1
Also given the above code, I don't see logs in my terminal from any of the agents - but I do see them in the cloud UI Is there a way to get STDOUT locally with my setup? - I added _log_stdout_=True but that didn't help
Snippet:
Copy code
prefect.config.debug = True
prefect.config.logging.level = "DEBUG"


@task(name="orchestrate", log_stdout=True)
def orchestrate():
    logger = prefect.context["logger"]
    <http://logger.info|logger.info>("Orchestrating")
    for i in range(1, 60):
        logger.error(f"Counting {i}")
    return True
c
For the agent logs: assuming you’re running a local agent you can use the CLI flag
--show-flow-logs
or the kwarg
show_flow_logs=True
in
run_agent
to see the flow run logs in STDOUT. Note this only works for the local agent because all other agents submit jobs that don’t can’t pipe STDOUT back and forth
Hi @Elie Hamouche - I’m just catching up here. I do still think this might be a race condition in the backend, but in the meantime if you only run a single agent you shouldn’t encounter this anymore (This is ultimately a consequence of running more agents than there are concurrency slots)
e
That worked, thank you for the quick responses! @Chris White @Jeremiah
💯 1
j
Anytime! We’re going to look into this and make sure the user experience improves, thanks for bringing it to our attention 🙂
e
If possible, please tag me when fixed - I'm putting a demo together for my team to replace some of our legacy airflow jobs with Prefect and the hybrid model is a big selling point
👍 1
@Chris White Question about the solution of running only one agent at - I'm relying on the agent to execute the flow - I prefer this to dask/k8s jobs because I'd like the agent to be long lived (start ups take long) Is the concurrency a limitation of prefect cloud? or is it in general not recommended to use the agent concurrently if it's also the executor?
c
Hey @Elie Hamouche very sorry for missing this question - it’s perfectly valid to have horizontally scaled agents if you are worried about them falling over, but there’s no benefit other than redundancy. Also we think we have released some improvements to the backend that will prevent your Stuck / Late work situation - please let us know if you experience it again!
e
Awesome, thanks a lot Chris 👍
@Chris White I just tried it out - but still getting stuck/late jobs Same flow and code Screenshot attached I'm a little confused, is it something I'm doing? Wondering why I'm experiencing this with a simple example
j
@Elie Hamouche do you mind sharing an id or your tenant slug (it’ll be in the URL)
DM me if you prefer
e
Sure just DMed you, let me know if you need any further debugging, info from my end, I have everything open
c
Hi Elie, a few questions: - can you confirm that the clock on the machine that your agent is running on is accurate? e.g., if you run
date
from the terminal or
import pendulum; pendulum.now("utc")
and compare that against the scheduled timestamps for your flow runs? - is the delay in your flow runs consistent or random?
e
Copy code
(data-server) data-server (master|✚29…) date
Thu 20 Aug 2020 20:20:58 BST
(data-server) data-server (master|✚29…) python
>>> import pendulum; pendulum.now("utc")
DateTime(2020, 8, 20, 19, 21, 25, 382367, tzinfo=Timezone('UTC'))
seems inline with UTC
c
yea that looks about right to me
e
The delay is consistent, but there are occasional bursts where everything works
🧐 1
Screenshot from today's runs - there's a burst where everything was running But I'm not sure if this is due to manual intervention on your end
c
Hm what’s really odd is that you appear to have 2 submitted states in that image, but you should only ever be able to have 1 submitted state at a time
e
Could it be because I'm running multiple agents?
I have 3 agents running