Re: Checkpointer write combining - Mailing list pgsql-hackers
| From | Soumya S Murali |
|---|---|
| Subject | Re: Checkpointer write combining |
| Date | |
| Msg-id | CAMtXxw9cqxgNH6=8NDAA2o11GoF=4P4JO=7-FCkhr=vJCmQiJA@mail.gmail.com Whole thread Raw |
| In response to | Re: Checkpointer write combining (Soumya S Murali <soumyamurali.work@gmail.com>) |
| List | pgsql-hackers |
Hi all, > Thank you all for the patches. > I am keeping this as a single patch because the refactoring, batching > behavior and instrumentation are tightly coupled and all serve one > purpose to reduce checkpoint writeback overhead while making the > effect observable. Due to version and context differences, the patches > did not apply cleanly in my development environment. Instead, I > studied the patches and went through the logic in detail and then > implemented the same ideas directly in my current tree adapting them > wherever needed. The implementation was then validated with > instrumentation and measurements. > > Before batching: > 2026-01-22 17:27:26.969 IST [148738] LOG: checkpoint complete: wrote > 15419 buffers (94.1%), wrote 1 SLRU buffers; 0 WAL file(s) added, 0 > removed, 25 recycled; write=0.325 s, sync=0.284 s, total=0.754 s; sync > files=30, longest=0.227 s, average=0.010 s; distance=407573 kB, > estimate=407573 kB; lsn=0/1A5B8E30, redo lsn=0/1A5B8DD8 > > After batching: > 2026-01-22 17:31:36.165 IST [148738] LOG: checkpoint complete: wrote > 13537 buffers (82.6%), wrote 1 SLRU buffers; 0 WAL file(s) added, 0 > removed, 25 recycled; write=0.260 s, sync=0.211 s, total=0.625 s; sync > files=3, longest=0.205 s, average=0.070 s; distance=404310 kB, > estimate=407247 kB; lsn=0/3308E738, redo lsn=0/3308E6E0 > > Debug instrumentation with (batch size = 16) confirms the batching > behavior itself, > buffers_written = 6196 > writeback_calls = 389 > On average: I am getting 15.9 i.e approx 16 buffers per writeback > This shows that writebacks are issued per batch rather than per > buffer, while WAL ordering and durability semantics remain unchanged. > The change remains localized to BufferSync() and is intended to be a > conservative and measurable improvement to checkpoint I/O behavior. I > am attaching the patches herewith for review. > I am happy to adjust the approach if there are concerns or > suggestions. Looking forward to more feedback. > With reference to my previous patch related to the batching behavior, I evaluated batch sizes 8, 16, and 32 under identical workloads. I am attaching the log for 8, 16 and 32. All conclusions are based on actual checkpoint logs and DEBUG BufferSync statistics: Batch size = 8 LOG: checkpoint complete: wrote 12622 buffers (77.0%); write=0.113 s, sync=0.195 s, total=0.485 s; sync files=37 DEBUG: checkpoint BufferSync stats: buffers_written=9923, writeback_calls=1242 Avg: 7.989 approx 8 buffers per writeback. Batch size = 16 LOG: checkpoint complete: wrote 13537 buffers (82.6%); write=0.260 s, sync=0.211 s, total=0.625 s; sync files=3 DEBUG: checkpoint BufferSync stats: buffers_written=6196, writeback_calls=389 Avg: 15.9 approx 16 buffers per writeback. Batch size = 32 LOG: checkpoint complete: wrote 12914 buffers (78.8%); write=0.116 s, sync=0.136 s, total=0.442 s; sync files=5 DEBUG: checkpoint BufferSync stats: buffers_written=12914, writeback_calls=1616 Avg: 7.99 approx 8 buffers per writeback. Batch 16 significantly reduces sync fan-out (as low as 3 files per checkpoint), but this comes at the cost of longer individual sync operations, resulting in higher total checkpoint time (≈0.625 s). Batch 32 provides a better balance, maintaining low sync fragmentation while avoiding long sync stalls, yielding the lowest overall checkpoint time (≈0.442 s). I am attaching the patch with batch size fixed as 32 for now for further review. Please let me know if further workloads or instrumentation would be useful. Regards Soumya
Attachment
pgsql-hackers by date: