Does anyone have any advice on configuring their postgres se Prefect Community #ask-community

Does anyone have any advice on configuring their p...

Tim Galvin

08/06/2024, 5:39 AM

Does anyone have any advice on configuring their postgres server for high write i/o type workflows? I am beginning to really get going with some prefect deployments in my HPC telescope environment, but occasionally am seeing the database stumble every so often when there is a sudden influx on newly registered or executed tasks. Not clear to me what the best way to go about addressing this. Likely some configuration settings, just unsure which knobs to turn For reference my startup of postgres is below, which is where I am setting the appropriate configurables

Copy code

apptainer run --cleanenv --bind "$POSTGRES_SCRATCH":/var postgres_latest.sif -c max_connections=2096 -c shared_buffers=8000MB \
        -c min_wal_size=8096 -c max_wal_size=32384 -c synchronous_commit=off -c wal_buffers=16MB

Janet Carson

08/07/2024, 6:03 PM

What sort of error are you getting?You should check your postgres logs for sure, but the wal settings seem small? You checkpoint every time you there is more than max_wal_size data changes pending so every transaction larger than 32384 (bytes since you didn't specify a unit?) will cause one or more checkpoints. You should maybe join the postgres slack and ask database experts rather than prefect experts, though. I know just enough to be dangerous about this sort of thing.

Tim Galvin

08/08/2024, 3:30 AM

Apparently the units are in MB :S

Tim Galvin

08/14/2024, 4:16 AM

Ha - ok - I think I found my real issue. Whatever set of workers I established to service the REST api through uvicorn were too long-lived. Looking at system memory usage over time showed a steady rise to 100 percent, before the service and / or postgres locked up. The solution was to set uvicorn options that more aggresively restarted python uvicorn workers. Where ever this memory leak lives (fastapi, pydantic, uvicorn, etc) does not have enough time to grow large enough to cause issues

8 Views

Open in Slack

Previous Next