Hi Everyone, can anyone please tell why i am getti...
# ask-community
n
Hi Everyone, can anyone please tell why i am getting this error.
Copy code
Crash detected! Execution was interrupted by an unexpected exception: httpx.HTTPStatusError: Client error '404 Not Found' for url '<http://orion-url/api/task_runs/24fd3ed1-78e5-49b2-84ab-ea42c385c942/set_state>'
For more information check: <https://httpstatuses.com/404>
I stated inference in 500 inputs, and after sometime it gets crashed. some of the ques i have is. 1. Why it gets crashed, is it because I gave 500 inputs and prefect server fail to handle it. 2. If it gets crashed, is there any mechanism to handle this. 3. And I am using Kubernetes as infrastructure Can anyone help me please
c
Hi Nimesh, the crash is because the URL for set_state on that task id doesn’t exist
If this is OSS which it appears to be based on your ingress URL, it seems likely based on your comment about 500 inputs that it’s bottlenecking at the database potentially, so it’s trying to set_state on a task ID that hasn’t actually been committed yet
n
so can you please suggest how should i handle this
c
OSS isn’t really suited for scale - you would need to optimize the database
this isn’t really a “break / fix” issue - it’s an optimization one, that would require reviewing the database on your end and where the bottlenecks are occurring in your infrastructure
If this is production and scale, this is generally where and why we would defer to prefect cloud, as it’s designed on the backend to scale in this way
n
I am using the postgres database which is required for the prefect, There is nothing else i am doing with this database. This is specifically for prefect. for current situation : 1. Do you suggest we should limit the number of inputs. 2. By optimizing the database what exactly did you meant as i am using the DB just for prefect
Till we shift to Prefect Cloud, Just to add up, I am using Kubernetes K8s as our infrastructure agent for our flows. Do you think we should create a Postgres cluster using k8s StatefulSets ? Will this solve our issue ?
d
I dont think one can accurately say with 100% accuracy what you should be doing in such cases because it is your infrastructure. Try reduce parallelism, increase load and load test each component - iterate.
n
Thanks to both of you for clearing things up and about optimizing the database, can you please comment on that. What measure or what are things that we can try doing to optimize my postgres database ?
d
optimize the infrastructure paramters* not the database by itself
c
I think these are still valuable resources: https://discourse.prefect.io/t/how-scalable-is-prefect-server-for-scheduling-concurrent-runs-of-tens-of-thousands-of-flows/532 https://stackoverflow.com/questions/63289719/how-does-prefect-scale-with-thousands-of-workflows-concurrently/63291586#63291586 I would start small and iterate - you mentioned 500 inputs, but it’s not clear. Is that where you started, or where you ended up? If that’s where you started, start smaller and verify that it still functions appropriately. If that’s where you ended up, where was it working successfully? As you are on the OSS, this is highly specific to you, and your infrastructure and workload - different workloads have different usage and access patterns so it’s ultimately up to you test
🙌 1
n
Just to double check by OSS you meant "Operations support system" ?
c
open source
n
Thanks. Thankyou for your valuable suggestion, I will try and implement this and will comment here the observation.
Hey @Christopher Boyd @Deceivious @Nate This is my orion logs, can you please look at this once, if this is able to provide some precise reason of why database failed.
⬆️ 1