Hi Everyone I ve just started trying out the Prefect Cloud a Prefect Community #prefect-server

Hi Everyone I've just started trying out the Prefe...

Elie Hamouche

08/16/2020, 10:38 AM

Hi Everyone I've just started trying out the Prefect Cloud and unfortunately I can't seem to get any tasks to run I have a simple task that looks like this:

Copy code

from datetime import timedelta

import prefect
from prefect import Flow, task
from prefect.schedules import IntervalSchedule

prefect.config.debug = True
prefect.config.logging.level = "DEBUG"

@task(name="orchestrate", log_stdout=True)
def orchestrate():
    logger = prefect.context["logger"]
    logger.info("Orchestrating")
    for i in range(1, 60):
        logger.error(f"Counting {i}")
    return True


flow = Flow("Cloud Flow", schedule=IntervalSchedule(
    interval=timedelta(minutes=1),
))
with flow:
    orchestrate()

flow.register(project_name="dev")
flow.run_agent(token="XXXX")

And I start the agent with:

Copy code

python example.py

Agent seems to come up fine but just hangs:

Copy code

(venv) ibraflow (master|✚5…) python example.py
Result check: OK
Flow: <https://cloud.prefect.io/ysft/flow/b7f56496-54df-4eeb-a0b6-2e274c39ed80>

 ____            __           _        _                    _
|  _ \ _ __ ___ / _| ___  ___| |_     / \   __ _  ___ _ __ | |_
| |_) | '__/ _ \ |_ / _ \/ __| __|   / _ \ / _` |/ _ \ '_ \| __|
|  __/| | |  __/  _|  __/ (__| |_   / ___ \ (_| |  __/ | | | |_
|_|   |_|  \___|_|  \___|\___|\__| /_/   \_\__, |\___|_| |_|\__|
                                           |___/

[2020-08-16 10:34:38,874] INFO - agent | Starting LocalAgent with labels ['MacBook-Pro-2.local', 'azure-flow-storage', 'gcs-flow-storage', 's3-flow-storage', 'github-flow-storage', 'webhook-flow-storage']
[2020-08-16 10:34:38,874] INFO - agent | Agent documentation can be found at <https://docs.prefect.io/orchestration/>
[2020-08-16 10:34:38,875] INFO - agent | Agent connecting to the Prefect API at <https://api.prefect.io>
[2020-08-16 10:34:39,009] INFO - agent | Waiting for flow runs...

On the cloud's UI I see the correct version and task, so that rules out authentication and network issues? But the tasks never run and are always marked "late" - attachment:

nicholas

08/16/2020, 11:56 AM

Hi @Elie Hamouche - do you see your agent registered on the Dashboard screen of the UI? It'll look something like this:

Elie Hamouche

08/16/2020, 1:32 PM

Hi @nicholas Yes I see only one agent registered - however I have 3 instances running Also further update, I left it for a while and it looks like some jobs have been executed - but it's just executing very late

Elie Hamouche

08/16/2020, 1:33 PM

Screenshot of agent logs indicating that jobs were submitted and ran (there's 3 running simultaneously)

Jeremiah

08/16/2020, 6:54 PM

Hi @Elie Hamouche, I’m sorry for the trouble - I’ve been looking into this on the Cloud side for you. I can’t yet explain why you experienced this behavior, but I can explain what happened - we use a heuristic to avoid releasing work to agents if the agents already have a high number of

SUBMITTED

(or

RUNNING

) runs. This is to solve a situation we encountered in the early days of Cloud where an Agent would run into trouble (for example being unable to launch K8s jobs) and would end up spamming the cluster with unrunnable jobs. Therefore, if agents have high submitted counts, we don’t deliver work until they free up. In your case, Cloud was consistently behaving as if you had multiple

SUBMITTED

runs, when of course you don’t. This value is cached, but only for a short while, after which time it should refresh from the database.Therefore I’m still unable to explain why it persisted for so long, but I’ve manually cleared it and if you see the problem again, please DM me so I can try to identify it directly.

Elie Hamouche

08/16/2020, 8:53 PM

So I deleted the flow from the UI, renamed it and started 3 agents and I get the delays again (screenshot attached) https://cloud.prefect.io/ysft/flow/7fa63495-1a61-4052-9f0e-0a273cb0bad1

Jeremiah

08/16/2020, 9:00 PM

Elie, thanks so much for posting this - I was able to observe the situation! It may be a bug in the queueing logic, I’m opening a ticket to resolve ASAP

Elie Hamouche

08/16/2020, 9:01 PM

Awesome, glad I could help It seems to be working fine now the late runs have all cleared up

👍 1

Elie Hamouche

08/16/2020, 9:05 PM

Also given the above code, I don't see logs in my terminal from any of the agents - but I do see them in the cloud UI Is there a way to get STDOUT locally with my setup? - I added _log_stdout_=True but that didn't help

Elie Hamouche

08/16/2020, 9:05 PM

Snippet:

Copy code

prefect.config.debug = True
prefect.config.logging.level = "DEBUG"


@task(name="orchestrate", log_stdout=True)
def orchestrate():
    logger = prefect.context["logger"]
    <http://logger.info|logger.info>("Orchestrating")
    for i in range(1, 60):
        logger.error(f"Counting {i}")
    return True

Chris White

08/16/2020, 9:08 PM

For the agent logs: assuming you’re running a local agent you can use the CLI flag

--show-flow-logs

or the kwarg

show_flow_logs=True

run_agent

to see the flow run logs in STDOUT. Note this only works for the local agent because all other agents submit jobs that don’t can’t pipe STDOUT back and forth

Chris White

08/16/2020, 9:18 PM

Hi @Elie Hamouche - I’m just catching up here. I do still think this might be a race condition in the backend, but in the meantime if you only run a single agent you shouldn’t encounter this anymore (This is ultimately a consequence of running more agents than there are concurrency slots)

Elie Hamouche

08/16/2020, 9:19 PM

That worked, thank you for the quick responses! @Chris White @Jeremiah

💯 1

Jeremiah

08/16/2020, 9:26 PM

Anytime! We’re going to look into this and make sure the user experience improves, thanks for bringing it to our attention 🙂

Elie Hamouche

08/16/2020, 10:29 PM

If possible, please tag me when fixed - I'm putting a demo together for my team to replace some of our legacy airflow jobs with Prefect and the hybrid model is a big selling point

👍 1

Elie Hamouche

08/17/2020, 8:11 AM

@Chris White Question about the solution of running only one agent at - I'm relying on the agent to execute the flow - I prefer this to dask/k8s jobs because I'd like the agent to be long lived (start ups take long) Is the concurrency a limitation of prefect cloud? or is it in general not recommended to use the agent concurrently if it's also the executor?

Chris White

08/19/2020, 8:52 PM

Hey @Elie Hamouche very sorry for missing this question - it’s perfectly valid to have horizontally scaled agents if you are worried about them falling over, but there’s no benefit other than redundancy. Also we think we have released some improvements to the backend that will prevent your Stuck / Late work situation - please let us know if you experience it again!

Elie Hamouche

08/20/2020, 6:41 PM

Awesome, thanks a lot Chris 👍

Elie Hamouche

08/20/2020, 6:49 PM

@Chris White I just tried it out - but still getting stuck/late jobs Same flow and code Screenshot attached I'm a little confused, is it something I'm doing? Wondering why I'm experiencing this with a simple example

Jeremiah

08/20/2020, 6:52 PM

@Elie Hamouche do you mind sharing an id or your tenant slug (it’ll be in the URL)

Jeremiah

08/20/2020, 6:52 PM

DM me if you prefer

Elie Hamouche

08/20/2020, 6:53 PM

Sure just DMed you, let me know if you need any further debugging, info from my end, I have everything open

Chris White

08/20/2020, 7:18 PM

Hi Elie, a few questions: - can you confirm that the clock on the machine that your agent is running on is accurate? e.g., if you run

date

from the terminal or

import pendulum; pendulum.now("utc")

and compare that against the scheduled timestamps for your flow runs? - is the delay in your flow runs consistent or random?

Elie Hamouche

08/20/2020, 7:22 PM

Copy code

(data-server) data-server (master|✚29…) date
Thu 20 Aug 2020 20:20:58 BST
(data-server) data-server (master|✚29…) python
>>> import pendulum; pendulum.now("utc")
DateTime(2020, 8, 20, 19, 21, 25, 382367, tzinfo=Timezone('UTC'))

Elie Hamouche

08/20/2020, 7:23 PM

seems inline with UTC

Chris White

08/20/2020, 7:23 PM

yea that looks about right to me

Elie Hamouche

08/20/2020, 7:26 PM

The delay is consistent, but there are occasional bursts where everything works

🧐 1

Elie Hamouche

08/20/2020, 7:29 PM

Screenshot from today's runs - there's a burst where everything was running But I'm not sure if this is due to manual intervention on your end

Chris White

08/20/2020, 7:47 PM

Hm what’s really odd is that you appear to have 2 submitted states in that image, but you should only ever be able to have 1 submitted state at a time

Elie Hamouche

08/20/2020, 8:02 PM

Could it be because I'm running multiple agents?

Elie Hamouche

08/20/2020, 8:02 PM

I have 3 agents running

5 Views

Open in Slack

Previous Next