Home > mailing lists

Re: Checkpointer write combining - Mailing list pgsql-hackers

From	Soumya S Murali
Subject	Re: Checkpointer write combining
Date	January 23 15:17:38
Msg-id	CAMtXxw9cqxgNH6=8NDAA2o11GoF=4P4JO=7-FCkhr=vJCmQiJA@mail.gmail.com Whole thread Raw
In response to	Re: Checkpointer write combining (Soumya S Murali <soumyamurali.work@gmail.com>)
List	pgsql-hackers

Tree view

Hi all,


> Thank you all for the patches.
> I am keeping this as a single patch because the refactoring, batching
> behavior and instrumentation are tightly coupled and all serve one
> purpose to reduce checkpoint writeback overhead while making the
> effect observable. Due to version and context differences, the patches
> did not apply cleanly in my development environment. Instead, I
> studied the patches and went through the logic in detail and then
> implemented the same ideas directly in my current tree adapting them
> wherever needed. The implementation was then validated with
> instrumentation and measurements.
>
> Before batching:
> 2026-01-22 17:27:26.969 IST [148738] LOG:  checkpoint complete: wrote
> 15419 buffers (94.1%), wrote 1 SLRU buffers; 0 WAL file(s) added, 0
> removed, 25 recycled; write=0.325 s, sync=0.284 s, total=0.754 s; sync
> files=30, longest=0.227 s, average=0.010 s; distance=407573 kB,
> estimate=407573 kB; lsn=0/1A5B8E30, redo lsn=0/1A5B8DD8
>
> After batching:
> 2026-01-22 17:31:36.165 IST [148738] LOG:  checkpoint complete: wrote
> 13537 buffers (82.6%), wrote 1 SLRU buffers; 0 WAL file(s) added, 0
> removed, 25 recycled; write=0.260 s, sync=0.211 s, total=0.625 s; sync
> files=3, longest=0.205 s, average=0.070 s; distance=404310 kB,
> estimate=407247 kB; lsn=0/3308E738, redo lsn=0/3308E6E0
>
> Debug instrumentation with (batch size = 16) confirms the batching
> behavior itself,
> buffers_written = 6196
> writeback_calls = 389
> On average: I am getting 15.9 i.e approx 16 buffers per writeback
> This shows that writebacks are issued per batch rather than per
> buffer, while WAL ordering and durability semantics remain unchanged.
> The change remains localized to BufferSync() and is intended to be a
> conservative and measurable improvement to checkpoint I/O behavior. I
> am attaching the patches herewith for review.
> I am happy to adjust the approach if there are concerns or
> suggestions. Looking forward to more feedback.
>

With reference to my previous patch related to the batching behavior,
I evaluated batch sizes 8, 16, and 32 under identical workloads. I am
attaching the log for 8, 16 and 32. All conclusions are based on
actual checkpoint logs and DEBUG BufferSync statistics:

Batch size = 8
LOG: checkpoint complete: wrote 12622 buffers (77.0%); write=0.113 s,
sync=0.195 s, total=0.485 s; sync files=37
DEBUG:  checkpoint BufferSync stats: buffers_written=9923, writeback_calls=1242
Avg: 7.989 approx 8 buffers per writeback.

Batch size = 16
LOG: checkpoint complete: wrote 13537 buffers (82.6%); write=0.260 s,
sync=0.211 s, total=0.625 s; sync files=3
DEBUG:  checkpoint BufferSync stats: buffers_written=6196, writeback_calls=389
Avg: 15.9 approx 16 buffers per writeback.

Batch size = 32
LOG: checkpoint complete: wrote 12914 buffers (78.8%); write=0.116 s,
sync=0.136 s, total=0.442 s; sync files=5
DEBUG:  checkpoint BufferSync stats: buffers_written=12914, writeback_calls=1616
Avg: 7.99 approx 8 buffers per writeback.

Batch 16 significantly reduces sync fan-out (as low as 3 files per
checkpoint), but this comes at the cost of longer individual sync
operations, resulting in higher total checkpoint time (≈0.625 s).
Batch 32 provides a better balance, maintaining low sync fragmentation
while avoiding long sync stalls, yielding the lowest overall
checkpoint time (≈0.442 s). I am attaching the patch with batch size
fixed as 32 for now for further review.
Please let me know if further workloads or instrumentation would be useful.

Regards
Soumya

Attachment

0001-Checkpointer-batch-data-writeback-during-BufferSync.patch

pgsql-hackers by date:

From: Fujii Masao
Date: 23 January, 15:13:58
Subject: Is abort() still needed in WalSndShutdown()?

From: Jim Jones
Date: 23 January, 15:19:25
Subject: Re: WIP - xmlvalidate implementation from TODO list

Re: Checkpointer write combining - Mailing list pgsql-hackers

Attachment

Previous

Next