Hi Everyone can anyone please tell why i am getting this err Prefect Community #ask-community

Hi Everyone, can anyone please tell why i am getti...

Nimesh Kumar

05/16/2023, 12:00 PM

Hi Everyone, can anyone please tell why i am getting this error.

Copy code

Crash detected! Execution was interrupted by an unexpected exception: httpx.HTTPStatusError: Client error '404 Not Found' for url '<http://orion-url/api/task_runs/24fd3ed1-78e5-49b2-84ab-ea42c385c942/set_state>'
For more information check: <https://httpstatuses.com/404>

I stated inference in 500 inputs, and after sometime it gets crashed. some of the ques i have is. 1. Why it gets crashed, is it because I gave 500 inputs and prefect server fail to handle it. 2. If it gets crashed, is there any mechanism to handle this. 3. And I am using Kubernetes as infrastructure Can anyone help me please

Christopher Boyd

05/16/2023, 12:19 PM

Hi Nimesh, the crash is because the URL for set_state on that task id doesn’t exist

Christopher Boyd

05/16/2023, 12:20 PM

If this is OSS which it appears to be based on your ingress URL, it seems likely based on your comment about 500 inputs that it’s bottlenecking at the database potentially, so it’s trying to set_state on a task ID that hasn’t actually been committed yet

Nimesh Kumar

05/16/2023, 12:21 PM

so can you please suggest how should i handle this

Christopher Boyd

05/16/2023, 12:22 PM

OSS isn’t really suited for scale - you would need to optimize the database

Christopher Boyd

05/16/2023, 12:22 PM

this isn’t really a “break / fix” issue - it’s an optimization one, that would require reviewing the database on your end and where the bottlenecks are occurring in your infrastructure

Christopher Boyd

05/16/2023, 12:23 PM

If this is production and scale, this is generally where and why we would defer to prefect cloud, as it’s designed on the backend to scale in this way

Nimesh Kumar

05/16/2023, 12:27 PM

I am using the postgres database which is required for the prefect, There is nothing else i am doing with this database. This is specifically for prefect. for current situation : 1. Do you suggest we should limit the number of inputs. 2. By optimizing the database what exactly did you meant as i am using the DB just for prefect

Nimesh Kumar

05/16/2023, 12:34 PM

Till we shift to Prefect Cloud, Just to add up, I am using Kubernetes K8s as our infrastructure agent for our flows. Do you think we should create a Postgres cluster using k8s StatefulSets ? Will this solve our issue ?

Deceivious

05/16/2023, 12:36 PM

I dont think one can accurately say with 100% accuracy what you should be doing in such cases because it is your infrastructure. Try reduce parallelism, increase load and load test each component - iterate.

Nimesh Kumar

05/16/2023, 12:42 PM

Thanks to both of you for clearing things up and about optimizing the database, can you please comment on that. What measure or what are things that we can try doing to optimize my postgres database ?

Deceivious

05/16/2023, 12:43 PM

optimize the infrastructure paramters* not the database by itself

Christopher Boyd

05/16/2023, 12:45 PM

I think these are still valuable resources: https://discourse.prefect.io/t/how-scalable-is-prefect-server-for-scheduling-concurrent-runs-of-tens-of-thousands-of-flows/532 https://stackoverflow.com/questions/63289719/how-does-prefect-scale-with-thousands-of-workflows-concurrently/63291586#63291586 I would start small and iterate - you mentioned 500 inputs, but it’s not clear. Is that where you started, or where you ended up? If that’s where you started, start smaller and verify that it still functions appropriately. If that’s where you ended up, where was it working successfully? As you are on the OSS, this is highly specific to you, and your infrastructure and workload - different workloads have different usage and access patterns so it’s ultimately up to you test

🙌 1

Nimesh Kumar

05/16/2023, 12:52 PM

Just to double check by OSS you meant "Operations support system" ?

Christopher Boyd

05/16/2023, 12:56 PM

open source

Nimesh Kumar

05/16/2023, 12:59 PM

Thanks. Thankyou for your valuable suggestion, I will try and implement this and will comment here the observation.

Nimesh Kumar

05/17/2023, 9:19 AM

Hey @Christopher Boyd @Deceivious @Nate This is my orion logs, can you please look at this once, if this is able to provide some precise reason of why database failed.

Discover_ DB_Logs - OpenSearch Dashboards.pdf

⬆️ 1

Open in Slack

Previous Next