I am running a prefect agent with Orion right now ...
# prefect-community
n
I am running a prefect agent with Orion right now with a deployed flow. The agent runs flows just fine if I start it and go hit quick run in the UI. But if I leave the agent sitting for too long I start getting this 403 error:
Copy code
Traceback (most recent call last):
  File "/home/nelson/miniconda3/envs/my_project/lib/python3.9/site-packages/prefect/cli/base.py", line 59, in wrapper
    return fn(*args, **kwargs)
  File "/home/nelson/miniconda3/envs/my_project/lib/python3.9/site-packages/prefect/utilities/asyncio.py", line 120, in wrapper
    return run_async_in_new_loop(async_fn, *args, **kwargs)
  File "/home/nelson/miniconda3/envs/my_project/lib/python3.9/site-packages/prefect/utilities/asyncio.py", line 67, in run_async_in_new_loop
    return anyio.run(partial(__fn, *args, **kwargs))
  File "/home/nelson/miniconda3/envs/my_project/lib/python3.9/site-packages/anyio/_core/_eventloop.py", line 56, in run
    return asynclib.run(func, *args, **backend_options)
  File "/home/nelson/miniconda3/envs/my_project/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 233, in run
    return native_run(wrapper(), debug=debug)
  File "/home/nelson/miniconda3/envs/my_project/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/nelson/miniconda3/envs/my_project/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/nelson/miniconda3/envs/my_project/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 228, in wrapper
    return await func(*args)
  File "/home/nelson/miniconda3/envs/my_project/lib/python3.9/site-packages/prefect/cli/agent.py", line 71, in start
    await agent.get_and_submit_flow_runs()
  File "/home/nelson/miniconda3/envs/my_project/lib/python3.9/site-packages/prefect/agent.py", line 88, in get_and_submit_flow_runs
    submittable_runs = await self.client.get_runs_in_work_queue(
  File "/home/nelson/miniconda3/envs/my_project/lib/python3.9/site-packages/prefect/client.py", line 747, in get_runs_in_work_queue
    response = await <http://self._client.post|self._client.post>(
  File "/home/nelson/miniconda3/envs/my_project/lib/python3.9/site-packages/prefect/utilities/httpx.py", line 137, in post
    return await self.request(
  File "/home/nelson/miniconda3/envs/my_project/lib/python3.9/site-packages/prefect/utilities/httpx.py", line 80, in request
    response.raise_for_status()
  File "/home/nelson/miniconda3/envs/my_project/lib/python3.9/site-packages/httpx/_models.py", line 1510, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '403 Forbidden' for url '<https://api-beta.prefect.io/api/accounts/df4b7089-cc2a-48ae-b4ce-baea44b163d6/workspaces/b22af91f-f810-4bc3-ac90-a1fa0e042c55/work_queues/c91e1439-be7e-4a98-8df0-da39515197b2/get_runs>'
For more information check: <https://httpstatuses.com/403>
An exception occurred.
Any ideas what might be causing this?
k
Hey @Nelson Griffiths, will check with the team about this
a
can you share your deployment spec, @Nelson Griffiths?
if you are running your flow in a container using Docker or Kubernetes flow runner, you may need to attach the API_KEY and API_ULR env variables, e.g.: Docker:
Copy code
DeploymentSpec(
    name="example",
    flow=docker_flow,
    tags=["local"],
    flow_runner=DockerFlowRunner(
        image="prefecthq/prefect:2.0ba2-python3.9",
        env={
            "EXTRA_PIP_PACKAGES": "pandas",
            "PREFECT_API_KEY": "xxx",
        },
        volumes=["/Users/anna/.aws:/root/.aws"],
    ),
)
Kubernetes:
Copy code
DeploymentSpec(
    name="prod",
    flow=kubernetes_flow,
    tags=["local"],
    flow_runner=KubernetesFlowRunner(
        env=dict(
            PREFECT_API_URL="<https://api-beta.prefect.io/api/accounts/yyy/workspaces/xxx>",
            PREFECT_API_KEY="YOUR_API_KEY",
        ),
    ),
)
you can also attach the same on UniversalFlowRunner:
Copy code
DeploymentSpec(
    name="cloud",
    flow=universal_flow,
    tags=["local"],
    flow_runner=UniversalFlowRunner(
        env=dict(
            PREFECT_API_URL="<https://api-beta.prefect.io/api/accounts/yyy/workspaces/xxx>",
            PREFECT_API_KEY="YOUR_API_KEY",
        ),
    ),
)
k
How long does it take for this to happen and is it consistent?
n
It has happened 3/3 times now. This last one took about 35 min before it died. It ran a few scheduled flows in that time
@Anna Geller here is my DeploymentSpec. Just running locally:
Copy code
DeploymentSpec(flow=ingest_tweets,
               name="udot-data-collection",
               parameters={"username": "UDOTTRAFFIC", "lookback_days": 1},
               tags=["db", "local"],
               schedule=IntervalSchedule(interval=timedelta(minutes=5)))
The strangest part is that it goes and gets and runs flows for 30 minutes successfully before throwing the error
a
so you're running everything locally - both Orion and your agent? and since you don't assign any FlowRunner, you use the default
SubprocessFlowRunner
n
Sorry I am using prefect cloud and running my agent locally.
👍 1
a
Do you mind trying to attach the universal flow runner with the API key to your DeploymentSpec and let us know if this helps?
Copy code
flow_runner=UniversalFlowRunner(
        env=dict(
            PREFECT_API_URL="<https://api-beta.prefect.io/api/accounts/yyy/workspaces/xxx>",
            PREFECT_API_KEY="YOUR_API_KEY",
        ),
    ),
I saw a similar error with the Docker flow runner and I assumed that this container (sub)process didn't get the API key... I'll ask the team about this but this is worth trying
the error 403 Forbidden indicates API key issue
n
I will give this a shot in a little bit and let you know if it fixes the issue
👍 1
It has now been alive for about an hour. So this seems to have fixed the problem.
I am guessing that the default behavior with a SubprocessFlowRunner will be looked into and fixed at some point though?
k
If we can replicate yes for sure that is not intended. Will be trying to
n
Let me know if there is anything you need from me!
👍 1
@Kevin Kho @Anna Geller This ran for a while but then turned into a 502 bad gateway error. Any further ideas?
k
Not at the moment, will check in with the team tom and discuss this
n
As a further update my agent is bouncing between 403 and 502 errors now when I start it
k
Oh man will look into this today
n
I appreciate it! Hopefully I just did something dumb in my setup. 🤷🏼‍♂️
k
So I am have agent running against Cloud, and am not finding any weirdness. You said it works for 30 mins. I have a set-up going on right now and I’ll try to leave it overnight. I have a 10 minute schedule
My agent seems to be fine. You have any advice for me to replicate? Do you have more than one work queue or agent?
n
Sorry I was traveling for a bit. Is there anyway I can get more detailed logs as to what is happening? I'm not sure how to tell you to replicate it. I'm happy to share my whole repository if that is helpful. It's just a small side project I'm working on
a
Sharing your repo will be helpful, for sure! Also, can you perhaps recreate your workspace, work queue and agent from scratch? within 10 days some things could have changed 🙂 https://orion-docs.prefect.io/ui/cloud/
n
I recreated everything and am still running into the same things. I will share my repo shortly, but I also have 2 other questions: 1. Is there some way to get better logs from the agent to understand why this is happening? 2. Is there a way I can ping the agent from another process to see if it is running and turn it back on programmatically as a work around for now?
a
#1 Yup, you can set the log level to debug this way:
Copy code
prefect config set PREFECT_LOGGING_LEVEL='DEBUG'
#2 To check if the agent process is running, you can inspect your running processes on the instance:
Copy code
ps -ef | grep "prefect agent start"
But the easiest way is to inspect the work queue the agent polls for:
Copy code
prefect work-queue inspect 'acffbcc8-ae65-4c83-a38a-96e2e5e5b441'