log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness |
Date | |
Msg-id | 20210724225036.fm4etupdtggurehf@alap3.anarazel.de Whole thread Raw |
Responses |
Re: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness
Re: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness |
List | pgsql-hackers |
Hi, I've been repeatedly confused by the the number of WAL files supposedly added. Even when 100s of new WAL files are created the relevant portion of log_checkpoints will only ever list zero or one added WAL file. The reason for that is that CheckpointStats.ckpt_segs_added is only incremented in PreallocXlogFiles(). Which has the following comment: * XXX this is currently extremely conservative, since it forces only one * future log segment to exist, and even that only if we are 75% done with * the current one. This is only appropriate for very low-WAL-volume systems. Whereas in real workloads WAL files are almost exclusively created via XLogWrite()->XLogFileInit(). I think we should consider just removing that field. Or, even better, show something accurate instead. As an example, here's the log output of a workload that has a replication slot preventing WAL files from being recycled (and too small max_wal_size): 2021-07-24 15:47:42.524 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55767 buffers (42.5%); 0 WAL file(s)added, 0 removed, 0 recycled; write=3.914 s, sync=0.041 s, total=3.972 s; sync files=10, longest=0.010 s, average=0.005s; distance=540578 kB, estimate=540905 kB 2021-07-24 15:47:46.721 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55806 buffers (42.6%); 1 WAL file(s)added, 0 removed, 0 recycled; write=3.855 s, sync=0.028 s, total=3.928 s; sync files=8, longest=0.008 s, average=0.004s; distance=540708 kB, estimate=540886 kB 2021-07-24 15:47:51.004 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55850 buffers (42.6%); 1 WAL file(s)added, 0 removed, 0 recycled; write=3.895 s, sync=0.034 s, total=3.974 s; sync files=9, longest=0.009 s, average=0.004s; distance=540657 kB, estimate=540863 kB 2021-07-24 15:47:55.231 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55879 buffers (42.6%); 0 WAL file(s)added, 0 removed, 0 recycled; write=3.878 s, sync=0.026 s, total=3.944 s; sync files=9, longest=0.007 s, average=0.003s; distance=540733 kB, estimate=540850 kB 2021-07-24 15:47:59.462 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55847 buffers (42.6%); 1 WAL file(s)added, 0 removed, 0 recycled; write=3.882 s, sync=0.027 s, total=3.952 s; sync files=9, longest=0.008 s, average=0.003s; distance=540640 kB, estimate=540829 kB So it's 3 new WAL files in that timeframe, one might think? A probe instrumenting xlog file creation shows something very different: perf probe -x /home/andres/build/postgres/dev-assert/vpath/src/backend/postgres -a XLogFileInitInternal:39 (39 is the O_CREAT BasicOpenFile(), not the recycle path). perf stat -a -e probe_postgres:XLogFileInitInternal_L39 -I 1000 1.001030943 9 probe_postgres:XLogFileInitInternal_L39 2.002998896 8 probe_postgres:XLogFileInitInternal_L39 3.005098857 8 probe_postgres:XLogFileInitInternal_L39 4.007000426 6 probe_postgres:XLogFileInitInternal_L39 5.008423119 9 probe_postgres:XLogFileInitInternal_L39 6.013427568 8 probe_postgres:XLogFileInitInternal_L39 7.015087613 8 probe_postgres:XLogFileInitInternal_L39 8.017393339 8 probe_postgres:XLogFileInitInternal_L39 9.018425526 7 probe_postgres:XLogFileInitInternal_L39 10.020398520 5 probe_postgres:XLogFileInitInternal_L39 Greetings, Andres Freund
pgsql-hackers by date: