https://prefect.io logo
#prefect-community
Title
# prefect-community
j

John-Craig Borman

03/23/2023, 3:38 PM
Hi all I'm getting a
Server error '500 Internal Server Error' for url
in an orchestration flow (run in Prefect Cloud) that is triggering many
run_deployment
calls. Is this likely an API limit being tripped?
b

Bianca Hoch

03/23/2023, 5:22 PM
Hi John, thanks for raising. Is this a transient issue? Were you able to run this successfully before without an issue? Do you have the full traceback for the error message?
j

John-Craig Borman

03/23/2023, 5:27 PM
Hi @Bianca Hoch I've seen this flow run for 2.5-3.5 hours and succeed (as of last week), this week it can run for 10min to 2 hours and fail with this exception:
Copy code
Encountered exception during execution:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/prefect/engine.py", line 643, in orchestrate_flow_run
    result = await flow_call()
  File "/.../flows/orchestrator.py", line 54, in orchestrator
    await utils.run_deployments_throttled(deployment_params, timeout=0)
  File "/.../flows/utils.py", line 75, in run_deployments_throttled
    return await aiometer.run_all(jobs, max_per_second=max_calls_per_second)
  File "/usr/local/lib/python3.10/dist-packages/aiometer/_impl/run_all.py", line 18, in run_all
    async with amap(
  File "/usr/lib/python3.10/contextlib.py", line 217, in __aexit__
    await self.gen.athrow(typ, value, traceback)
  File "/usr/local/lib/python3.10/dist-packages/aiometer/_impl/amap.py", line 69, in _amap
    async with anyio.create_task_group() as task_group:
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
    raise exceptions[0]
  File "/usr/local/lib/python3.10/dist-packages/aiometer/_impl/amap.py", line 74, in sender
    await run_on_each(
  File "/usr/local/lib/python3.10/dist-packages/aiometer/_impl/run_on_each.py", line 52, in run_on_each
    async with anyio.create_task_group() as task_group:
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
    raise exceptions[0]
  File "/usr/local/lib/python3.10/dist-packages/aiometer/_impl/run_on_each.py", line 19, in _worker
    result = await async_fn(value)
  File "/usr/local/lib/python3.10/dist-packages/prefect/client/utilities.py", line 47, in with_injected_client
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/prefect/deployments.py", line 123, in run_deployment
    flow_run = await client.create_flow_run_from_deployment(
  File "/usr/local/lib/python3.10/dist-packages/prefect/client/orion.py", line 433, in create_flow_run_from_deployment
    response = await <http://self._client.post|self._client.post>(
  File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1848, in post
    return await self.request(
  File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1533, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/usr/local/lib/python3.10/dist-packages/prefect/client/base.py", line 253, in send
    response.raise_for_status()
  File "/usr/local/lib/python3.10/dist-packages/httpx/_models.py", line 749, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url '<https://api.prefect.cloud/api/accounts/60b38cc6-c00d-4e61-8757-f9a7fa6b6f90/workspaces/444e91be-a3d5-416b-81a7-0361db5c01d1/deployments/b71ae396-6d0c-4a94-8c38-82f07803d108/create_flow_run>'
For more information check: <https://httpstatuses.com/500>
👀 1
I'm using
aiometer
to throttle requests to a max of 5 per second (as I thought this was an API limit issue), but now I'm not so sure
b

Bianca Hoch

03/23/2023, 9:38 PM
Hey John, sorry for the delay here. Can you share which version of prefect you're using? It'll be the output of
prefect version
in your terminal.
As far as rate limits go, flow and task creation limits are 2,000 per minute per account, but you should be getting a 429 in response to hitting the limit, not a 500.
j

John-Craig Borman

03/24/2023, 2:49 PM
Our deployments are currently using 2.7.12
b

Bianca Hoch

03/24/2023, 3:22 PM
Thanks for sending that over. Could you also provide some information on the types of parameters you're sendings when calling
run_deployment
?
j

John-Craig Borman

03/24/2023, 3:25 PM
Sure thing specifying
run_deployment(name=..., parameters={'key': value}, flow_run_name=value, timeout=0)
b

Bianca Hoch

03/24/2023, 5:34 PM
After sharing this with the team, there's a chance this 500 could be the result of a TimeOut that isn't being caught. Would you be willing to post an issue so that we can track it?
j

John-Craig Borman

03/24/2023, 6:11 PM
No problem, issue can be found here: https://github.com/PrefectHQ/prefect/issues/8926 Thanks @Bianca Hoch!
1
gratitude thank you 1
m

Matt Kizaric

03/24/2023, 8:12 PM
Hey there, I actually ran into this issue too today! I'm using a local docker-compose so I could pull the logs from the orion service and local postgres server. Here's the key takeaways I found: • Under the hood prefect runs a few services (
prefect.orion.services
) that have to query the database. These run on a loop that has a configurable timeout in most cases. I was able to bump
PREFECT_ORION_SERVICES_PAUSE_EXPIRATIONS_LOOP_SECONDS
to solve one exception. I don't use cloud, but if you can configure that it might help a bit • There was one Loop for notifications that has a non-configurable timeout (FlowRunNotifications). This has become my new bottleneck, though it might be more due to postgres being under-provisioned. Not sure if this helps much, but just some details I was able to pull out of my local testing
j

John-Craig Borman

03/24/2023, 8:15 PM
Hey thanks for the context @Matt Kizaric, if that is the case then in Prefect Cloud I think this must be a server-side code bug so the environment variable wouldn't change anything for the client (AFAIK).
If you could add those logs to the issue that might help the Prefect devs identify the root of the issue faster than my client side error
s

Scott Walsh

04/25/2023, 5:25 PM
I am having a similar issue with loop interval errors (using an over provisioned Postgres instance). Looks like the FlowRunNotifications loop is still hard coded in 2.10.5