log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness - Mailing list pgsql-hackers

From Andres Freund
Subject log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness
Date
Msg-id 20210724225036.fm4etupdtggurehf@alap3.anarazel.de
Whole thread Raw
Responses Re: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness  (Andres Freund <andres@anarazel.de>)
Re: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
Hi,

I've been repeatedly confused by the the number of WAL files supposedly
added. Even when 100s of new WAL files are created the relevant portion
of log_checkpoints will only ever list zero or one added WAL file.

The reason for that is that CheckpointStats.ckpt_segs_added is only
incremented in PreallocXlogFiles(). Which has the following comment:
 * XXX this is currently extremely conservative, since it forces only one
 * future log segment to exist, and even that only if we are 75% done with
 * the current one.  This is only appropriate for very low-WAL-volume systems.

Whereas in real workloads WAL files are almost exclusively created via
XLogWrite()->XLogFileInit().

I think we should consider just removing that field. Or, even better, show
something accurate instead.

As an example, here's the log output of a workload that has a replication slot
preventing WAL files from being recycled (and too small max_wal_size):

2021-07-24 15:47:42.524 PDT [2251649][checkpointer][:0] LOG:  checkpoint complete: wrote 55767 buffers (42.5%); 0 WAL
file(s)added, 0 removed, 0 recycled; write=3.914 s, sync=0.041 s, total=3.972 s; sync files=10, longest=0.010 s,
average=0.005s; distance=540578 kB, estimate=540905 kB
 
2021-07-24 15:47:46.721 PDT [2251649][checkpointer][:0] LOG:  checkpoint complete: wrote 55806 buffers (42.6%); 1 WAL
file(s)added, 0 removed, 0 recycled; write=3.855 s, sync=0.028 s, total=3.928 s; sync files=8, longest=0.008 s,
average=0.004s; distance=540708 kB, estimate=540886 kB
 
2021-07-24 15:47:51.004 PDT [2251649][checkpointer][:0] LOG:  checkpoint complete: wrote 55850 buffers (42.6%); 1 WAL
file(s)added, 0 removed, 0 recycled; write=3.895 s, sync=0.034 s, total=3.974 s; sync files=9, longest=0.009 s,
average=0.004s; distance=540657 kB, estimate=540863 kB
 
2021-07-24 15:47:55.231 PDT [2251649][checkpointer][:0] LOG:  checkpoint complete: wrote 55879 buffers (42.6%); 0 WAL
file(s)added, 0 removed, 0 recycled; write=3.878 s, sync=0.026 s, total=3.944 s; sync files=9, longest=0.007 s,
average=0.003s; distance=540733 kB, estimate=540850 kB
 
2021-07-24 15:47:59.462 PDT [2251649][checkpointer][:0] LOG:  checkpoint complete: wrote 55847 buffers (42.6%); 1 WAL
file(s)added, 0 removed, 0 recycled; write=3.882 s, sync=0.027 s, total=3.952 s; sync files=9, longest=0.008 s,
average=0.003s; distance=540640 kB, estimate=540829 kB
 

So it's 3 new WAL files in that timeframe, one might think?

A probe instrumenting xlog file creation shows something very different:
perf probe -x /home/andres/build/postgres/dev-assert/vpath/src/backend/postgres -a XLogFileInitInternal:39
(39 is the O_CREAT BasicOpenFile(), not the recycle path).

perf stat -a -e probe_postgres:XLogFileInitInternal_L39 -I 1000
     1.001030943                  9      probe_postgres:XLogFileInitInternal_L39
     2.002998896                  8      probe_postgres:XLogFileInitInternal_L39
     3.005098857                  8      probe_postgres:XLogFileInitInternal_L39
     4.007000426                  6      probe_postgres:XLogFileInitInternal_L39
     5.008423119                  9      probe_postgres:XLogFileInitInternal_L39
     6.013427568                  8      probe_postgres:XLogFileInitInternal_L39
     7.015087613                  8      probe_postgres:XLogFileInitInternal_L39
     8.017393339                  8      probe_postgres:XLogFileInitInternal_L39
     9.018425526                  7      probe_postgres:XLogFileInitInternal_L39
    10.020398520                  5      probe_postgres:XLogFileInitInternal_L39

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: CREATE SEQUENCE with RESTART option
Next
From: Andres Freund
Date:
Subject: Re: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness