Justin Trautmann

10/04/2022, 3:15 PM
hello community, hello prefect team, Is there any new rate limiting for the cloud API in place? when using prefect 2 cloud, i'm recently facing a lot of flow crashes with the error message:
Crash detected! Request to <[...]/workspaces/[...]/task_runs/> failed. 
RuntimeError: The connection pool was closed while 65 HTTP requests/responses were still in-flight.
this is most likely not related to local network issues as it is reproducible across different networks. when using a local orion server instead, the flow succeeds without any issues and a couple of days ago, the flow ran successfully on prefect cloud. I am submitting ~100 parallel tasks using the RayTaskRunner with a local cluster. Any help is much appreciated. python 3.8.10 prefect 2.4.5 prefect-ray 0.2.0.post2

Toby Rahloff

10/04/2022, 4:16 PM
Also reproducible with Prefect-Dask but I also get a slightly different error trace: (HTTP 500 from Prefect Cloud API)
prefect.exceptions.PrefectHTTPStatusError: Server error '500 Internal Server Error' for url '<>'
Response: {'exception_message': 'Internal Server Error'}
For more information check: <>
2022-10-04 18:13:47,715 - distributed.nanny - ERROR - Worker process died unexpectedly
2022-10-04 18:13:47,715 - distributed.nanny - ERROR - Worker process died unexpectedly
❯ prefect version
Version:             2.4.5
API version:         0.8.1
Python version:      3.8.10
Git commit:          dbe27317
Built:               Thu, Sep 29, 2022 3:05 PM
OS/Arch:             linux/x86_64
Profile:             dev
Server type:         cloud
@Jeff Hale could that be connected to the database upgrades that were rolled out w/ 2.4.2?
🤔 1
[Prefect 2.0 Cloud] Quick update on the "*Internal Server Error when creating TaskRuns*" (HTTP500): It seems like the error always happens when creating more than ~100 task-runs in parallel. Introducing a back-off/delay of over 3 seconds between two task-submits seems to mitigate the issue. Any delay under 3 seconds reproduces the error. This behavior was not encountered 12 days ago with prefect 2.4.1 and API version 0.8.0. Did something happen in the API 0.8.1 release? Is there a public changelog available? Unfortunately, this currently blocks us 100% because our production workflows rely on fan-out patterns that start around 500 parallel tasks simultaneously. Is there any workaround? Any hint or help is highly appreciated.
A different error message sometimes pops up with the same code: "httpx.RemoteProtocolError: Server disconnected without sending a response."

Zach Angell

10/05/2022, 1:09 PM
Hey @Toby Rahloff, sorry to hear you’re running into issues. We’re looking into this on our side. Would you be able to share any of the following? • account id • workspace id • example flow run id that failed • example request url that failed (the full url you redacted above) Feel free to DM if you’d like to keep it private

Toby Rahloff

10/05/2022, 1:55 PM
Hi Zach, thanks a lot for the immediate follow-up 🙌 Will ping you the information via DM and we can update this thread once we found the solution

Faheem Khan

10/05/2022, 11:33 PM
@Toby Rahloff got the same issue with all prefect >2.04. I am using Dask task runner. I am still using prefect 2.04 until the issue is fixed.