Thread: Background writer not much active
Hi Team,
Hope you are doing well!
We are doing performance testing of our applications. During testing our postgres db got crashed with the error checkpointer was killed by signal 9. When checked system logs we found it was OOM issue.
We have postgreSQL on independent node with 32GB RAM, multiple DBs are there witl multiple schemas.
Our current configuration is as follows:
Shared buffers 12 GB(increased from 10)
Checkpoint timeout 15 mins
Checkpoint completion target 0.9
Work_mem 6 MB
Maintenance work mem 1 gb
Effective cache size 20 GB.
Active connections around 1500-2000.
While analysing issue we made below changes:
Increased shared buffers from. 10 to 12 gb as buffers_alloc was getting increased
Bgwriter_delay was 200 ms, reduced to 100ms
Bgwriter multiplier increased from 2 to 4 and then reduced to 3
Bgwriter max pages increased from 100 to 1000
We changed above bgwriter parameters to make bgwriter more aggressive but still we see most of the writes are being done by checkpointer as indicated by buffers_checkpoint and then by backends indicated by buffers_backend and then by bgwriter indicated by buffers_clean. On an average 79% done by checkpointer, 16.5% by backends and 4.5% by bgwriter. Also buffers_alloc show huge number. All the stats taken from bg_writer_stats. I observed for 1 hour the stats form bgwriter after reducing bgwriter multiplier from 4 to 3 but buffers_clean value remained constant.
Below are the stats for last 8 hours(taken 4 hours back):
Buffers cleaned/written during checkpoints: 3243044. 83%
Buffers cleaned/written by bgwriter: 55430. 1%
cleaned/written by backends:
616659. 16%
Buffers_alloc difference: 2980619
Can you please advise on how to make bgwriter more active or am I missing to validate anything.
Also how to keep balance between bgwriter parameters - delay, multiplier and maxpages
Also please advise can we tune any parameters to fix the OOM error that I mentioned in the starting, apart from looking at the queries.
Also if I want to check what queries might have caused the memory issue, that would be queries just above the checkpointer killed error message in the postgres logs?
Thanks in advance!
Regards,
Ramzy
On Tue, 2025-03-18 at 13:32 +0530, Motog Plus wrote: > We are doing performance testing of our applications. During testing our postgres db > got crashed with the error checkpointer was killed by signal 9. When checked system > logs we found it was OOM issue. > We have postgreSQL on independent node with 32GB RAM, multiple DBs are there witl > multiple schemas. > > Our current configuration is as follows: > Shared buffers 12 GB(increased from 10) > Work_mem 6 MB > Maintenance work mem 1 gb > Active connections around 1500-2000. > > While analysing issue we made below changes: > Increased shared buffers from. 10 to 12 gb as buffers_alloc was getting increased > Bgwriter_delay was 200 ms, reduced to 100ms > Bgwriter multiplier increased from 2 to 4 and then reduced to 3 > Bgwriter max pages increased from 100 to 1000 > We changed above bgwriter parameters to make bgwriter more aggressive but still we see > most of the writes are being done by checkpointer as indicated by buffers_checkpoint > and then by backends indicated by buffers_backend and then by bgwriter indicated by > buffers_clean. That's exactly the way it should be. > Can you please advise on how to make bgwriter more active or am I missing to validate anything. I don't think that is indicated. > Also please advise can we tune any parameters to fix the OOM error that I mentioned in > the starting, apart from looking at the queries. Reduce the memory parameters: shared_buffers, work_mem, maintenance_work_mem But most of all, get a working connection pool and reduce the number of connections to something like 50. That way, you can be more generous with the memory parameters and still not run OOM. > Also if I want to check what queries might have caused the memory issue, that would > be queries just above the checkpointer killed error message in the postgres logs? Yes. Yours, Laurenz Albe
Thanks Laurenz for your response.
Regarding connect pool we are using hikari.
Also wanted to add that we have a replication setup with one primary and one standby and are using pgpool, postgres version 15.12.
We expect that many connections to the db, 1500-2000 so we have kept max connection limit to 3000.
Also, I saw few articles on the internet mentioning there is memory leak issue with version 15. We were observing the dashboard on grafana and saw that whenever the batch was completing memory was released very gradually as compared to what it used to be in our older version, 12. When monitoring the node using htop command isee checkpointer and bgwriter always at top of memory consumption with 33% each, even when batch gets completed.
Thanks & Regards,
Ramzy
On Tue, Mar 18, 2025, 14:39 Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2025-03-18 at 13:32 +0530, Motog Plus wrote:
> We are doing performance testing of our applications. During testing our postgres db
> got crashed with the error checkpointer was killed by signal 9. When checked system
> logs we found it was OOM issue.
> We have postgreSQL on independent node with 32GB RAM, multiple DBs are there witl
> multiple schemas.
>
> Our current configuration is as follows:
> Shared buffers 12 GB(increased from 10)
> Work_mem 6 MB
> Maintenance work mem 1 gb
> Active connections around 1500-2000.
>
> While analysing issue we made below changes:
> Increased shared buffers from. 10 to 12 gb as buffers_alloc was getting increased
> Bgwriter_delay was 200 ms, reduced to 100ms
> Bgwriter multiplier increased from 2 to 4 and then reduced to 3
> Bgwriter max pages increased from 100 to 1000
> We changed above bgwriter parameters to make bgwriter more aggressive but still we see
> most of the writes are being done by checkpointer as indicated by buffers_checkpoint
> and then by backends indicated by buffers_backend and then by bgwriter indicated by
> buffers_clean.
That's exactly the way it should be.
> Can you please advise on how to make bgwriter more active or am I missing to validate anything.
I don't think that is indicated.
> Also please advise can we tune any parameters to fix the OOM error that I mentioned in
> the starting, apart from looking at the queries.
Reduce the memory parameters: shared_buffers, work_mem, maintenance_work_mem
But most of all, get a working connection pool and reduce the number of
connections to something like 50. That way, you can be more generous with
the memory parameters and still not run OOM.
> Also if I want to check what queries might have caused the memory issue, that would
> be queries just above the checkpointer killed error message in the postgres logs?
Yes.
Yours,
Laurenz Albe
On Thu, 2025-03-20 at 18:21 +0530, Motog Plus wrote: > We expect that many connections to the db, 1500-2000 so we have kept max connection limit to 3000. Use a connection pooler. These are too many connections. Yours, Laurenz Albe