Thread: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness
log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness
From
Andres Freund
Date:
Hi, I've been repeatedly confused by the the number of WAL files supposedly added. Even when 100s of new WAL files are created the relevant portion of log_checkpoints will only ever list zero or one added WAL file. The reason for that is that CheckpointStats.ckpt_segs_added is only incremented in PreallocXlogFiles(). Which has the following comment: * XXX this is currently extremely conservative, since it forces only one * future log segment to exist, and even that only if we are 75% done with * the current one. This is only appropriate for very low-WAL-volume systems. Whereas in real workloads WAL files are almost exclusively created via XLogWrite()->XLogFileInit(). I think we should consider just removing that field. Or, even better, show something accurate instead. As an example, here's the log output of a workload that has a replication slot preventing WAL files from being recycled (and too small max_wal_size): 2021-07-24 15:47:42.524 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55767 buffers (42.5%); 0 WAL file(s)added, 0 removed, 0 recycled; write=3.914 s, sync=0.041 s, total=3.972 s; sync files=10, longest=0.010 s, average=0.005s; distance=540578 kB, estimate=540905 kB 2021-07-24 15:47:46.721 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55806 buffers (42.6%); 1 WAL file(s)added, 0 removed, 0 recycled; write=3.855 s, sync=0.028 s, total=3.928 s; sync files=8, longest=0.008 s, average=0.004s; distance=540708 kB, estimate=540886 kB 2021-07-24 15:47:51.004 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55850 buffers (42.6%); 1 WAL file(s)added, 0 removed, 0 recycled; write=3.895 s, sync=0.034 s, total=3.974 s; sync files=9, longest=0.009 s, average=0.004s; distance=540657 kB, estimate=540863 kB 2021-07-24 15:47:55.231 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55879 buffers (42.6%); 0 WAL file(s)added, 0 removed, 0 recycled; write=3.878 s, sync=0.026 s, total=3.944 s; sync files=9, longest=0.007 s, average=0.003s; distance=540733 kB, estimate=540850 kB 2021-07-24 15:47:59.462 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55847 buffers (42.6%); 1 WAL file(s)added, 0 removed, 0 recycled; write=3.882 s, sync=0.027 s, total=3.952 s; sync files=9, longest=0.008 s, average=0.003s; distance=540640 kB, estimate=540829 kB So it's 3 new WAL files in that timeframe, one might think? A probe instrumenting xlog file creation shows something very different: perf probe -x /home/andres/build/postgres/dev-assert/vpath/src/backend/postgres -a XLogFileInitInternal:39 (39 is the O_CREAT BasicOpenFile(), not the recycle path). perf stat -a -e probe_postgres:XLogFileInitInternal_L39 -I 1000 1.001030943 9 probe_postgres:XLogFileInitInternal_L39 2.002998896 8 probe_postgres:XLogFileInitInternal_L39 3.005098857 8 probe_postgres:XLogFileInitInternal_L39 4.007000426 6 probe_postgres:XLogFileInitInternal_L39 5.008423119 9 probe_postgres:XLogFileInitInternal_L39 6.013427568 8 probe_postgres:XLogFileInitInternal_L39 7.015087613 8 probe_postgres:XLogFileInitInternal_L39 8.017393339 8 probe_postgres:XLogFileInitInternal_L39 9.018425526 7 probe_postgres:XLogFileInitInternal_L39 10.020398520 5 probe_postgres:XLogFileInitInternal_L39 Greetings, Andres Freund
Re: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness
From
Andres Freund
Date:
Hi, On 2021-07-24 15:50:36 -0700, Andres Freund wrote: > As an example, here's the log output of a workload that has a replication slot > preventing WAL files from being recycled (and too small max_wal_size): > > 2021-07-24 15:47:42.524 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55767 buffers (42.5%); 0 WAL file(s)added, 0 removed, 0 recycled; write=3.914 s, sync=0.041 s, total=3.972 s; sync files=10, longest=0.010 s, average=0.005s; distance=540578 kB, estimate=540905 kB > 2021-07-24 15:47:46.721 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55806 buffers (42.6%); 1 WAL file(s)added, 0 removed, 0 recycled; write=3.855 s, sync=0.028 s, total=3.928 s; sync files=8, longest=0.008 s, average=0.004s; distance=540708 kB, estimate=540886 kB > 2021-07-24 15:47:51.004 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55850 buffers (42.6%); 1 WAL file(s)added, 0 removed, 0 recycled; write=3.895 s, sync=0.034 s, total=3.974 s; sync files=9, longest=0.009 s, average=0.004s; distance=540657 kB, estimate=540863 kB > 2021-07-24 15:47:55.231 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55879 buffers (42.6%); 0 WAL file(s)added, 0 removed, 0 recycled; write=3.878 s, sync=0.026 s, total=3.944 s; sync files=9, longest=0.007 s, average=0.003s; distance=540733 kB, estimate=540850 kB > 2021-07-24 15:47:59.462 PDT [2251649][checkpointer][:0] LOG: checkpoint complete: wrote 55847 buffers (42.6%); 1 WAL file(s)added, 0 removed, 0 recycled; write=3.882 s, sync=0.027 s, total=3.952 s; sync files=9, longest=0.008 s, average=0.003s; distance=540640 kB, estimate=540829 kB > > So it's 3 new WAL files in that timeframe, one might think? > > A probe instrumenting xlog file creation shows something very different: > perf probe -x /home/andres/build/postgres/dev-assert/vpath/src/backend/postgres -a XLogFileInitInternal:39 > (39 is the O_CREAT BasicOpenFile(), not the recycle path). > > perf stat -a -e probe_postgres:XLogFileInitInternal_L39 -I 1000 > 1.001030943 9 probe_postgres:XLogFileInitInternal_L39 > 2.002998896 8 probe_postgres:XLogFileInitInternal_L39 > 3.005098857 8 probe_postgres:XLogFileInitInternal_L39 > 4.007000426 6 probe_postgres:XLogFileInitInternal_L39 > 5.008423119 9 probe_postgres:XLogFileInitInternal_L39 > 6.013427568 8 probe_postgres:XLogFileInitInternal_L39 > 7.015087613 8 probe_postgres:XLogFileInitInternal_L39 > 8.017393339 8 probe_postgres:XLogFileInitInternal_L39 > 9.018425526 7 probe_postgres:XLogFileInitInternal_L39 > 10.020398520 5 probe_postgres:XLogFileInitInternal_L39 And even more extreme, the logs can end up suggestiong pg_wal is shrinking, when the opposite is the case. Here's the checkpoint output from a parallel copy data load (without a replication slot holding things back): 2021-07-24 15:59:03.215 PDT [2253324][checkpointer][:0] LOG: checkpoint complete: wrote 22291 buffers (17.0%); 0 WAL file(s)added, 27 removed, 141 recycled; write=9.737 s, sync=4.153 s, total=14.884 s; sync files=108, longest=0.116 s, average=0.039s; distance=2756904 kB, estimate=2756904 kB 2021-07-24 15:59:12.978 PDT [2253324][checkpointer][:0] LOG: checkpoint complete: wrote 21840 buffers (16.7%); 0 WAL file(s)added, 53 removed, 149 recycled; write=5.531 s, sync=3.008 s, total=9.763 s; sync files=81, longest=0.201 s, average=0.037s; distance=3313885 kB, estimate=3313885 kB 2021-07-24 15:59:23.421 PDT [2253324][checkpointer][:0] LOG: checkpoint complete: wrote 22326 buffers (17.0%); 0 WAL file(s)added, 56 removed, 149 recycled; write=5.787 s, sync=3.230 s, total=10.436 s; sync files=81, longest=0.099 s, average=0.040s; distance=3346125 kB, estimate=3346125 kB 2021-07-24 15:59:34.424 PDT [2253324][checkpointer][:0] LOG: checkpoint complete: wrote 22155 buffers (16.9%); 0 WAL file(s)added, 60 removed, 148 recycled; write=6.096 s, sync=3.432 s, total=10.995 s; sync files=81, longest=0.101 s, average=0.043s; distance=3409281 kB, estimate=3409281 kB It does look like WAL space usage is shrinking, but the opposite is true - we're creating so much WAL that the checkpointer can't checkpoint fast enough to keep WAL usage below max_wal_size. So WAL files are constantly created that then need to be removed (hence the non-zero removed counts). # time counts unit events 277.087990275 15 probe_postgres:XLogFileInitInternal_L39 278.098549960 15 probe_postgres:XLogFileInitInternal_L39 279.104787575 6 probe_postgres:XLogFileInitInternal_L39 280.108980690 5 probe_postgres:XLogFileInitInternal_L39 281.111781617 6 probe_postgres:XLogFileInitInternal_L39 282.113601958 2 probe_postgres:XLogFileInitInternal_L39 283.115711683 0 probe_postgres:XLogFileInitInternal_L39 284.121508636 0 probe_postgres:XLogFileInitInternal_L39 285.124865325 0 probe_postgres:XLogFileInitInternal_L39 286.126932016 0 probe_postgres:XLogFileInitInternal_L39 287.129874993 11 probe_postgres:XLogFileInitInternal_L39 288.131838429 15 probe_postgres:XLogFileInitInternal_L39 289.133609021 13 probe_postgres:XLogFileInitInternal_L39 290.136254341 6 probe_postgres:XLogFileInitInternal_L39 291.139368485 5 probe_postgres:XLogFileInitInternal_L39 292.142728293 6 probe_postgres:XLogFileInitInternal_L39 293.148078766 2 probe_postgres:XLogFileInitInternal_L39 294.150258476 0 probe_postgres:XLogFileInitInternal_L39 295.172398897 0 probe_postgres:XLogFileInitInternal_L39 296.174658196 0 probe_postgres:XLogFileInitInternal_L39 297.176818943 0 probe_postgres:XLogFileInitInternal_L39 298.179003473 14 probe_postgres:XLogFileInitInternal_L39 299.181597777 14 probe_postgres:XLogFileInitInternal_L39 300.184711566 14 probe_postgres:XLogFileInitInternal_L39 301.188919194 6 probe_postgres:XLogFileInitInternal_L39 Greetings, Andres Freund
Re: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness
From
Fujii Masao
Date:
On 2021/07/25 7:50, Andres Freund wrote: > Hi, > > I've been repeatedly confused by the the number of WAL files supposedly > added. Even when 100s of new WAL files are created the relevant portion > of log_checkpoints will only ever list zero or one added WAL file. > > The reason for that is that CheckpointStats.ckpt_segs_added is only > incremented in PreallocXlogFiles(). Which has the following comment: > * XXX this is currently extremely conservative, since it forces only one > * future log segment to exist, and even that only if we are 75% done with > * the current one. This is only appropriate for very low-WAL-volume systems. > > Whereas in real workloads WAL files are almost exclusively created via > XLogWrite()->XLogFileInit(). > > I think we should consider just removing that field. Or, even better, show > something accurate instead. +1 to show something accurate instead. It's also worth showing them in monitoring stats view like pg_stat_wal? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness
From
"Bossart, Nathan"
Date:
On 7/24/21, 8:10 PM, "Fujii Masao" <masao.fujii@oss.nttdata.com> wrote: > On 2021/07/25 7:50, Andres Freund wrote: >> Hi, >> >> I've been repeatedly confused by the the number of WAL files supposedly >> added. Even when 100s of new WAL files are created the relevant portion >> of log_checkpoints will only ever list zero or one added WAL file. >> >> The reason for that is that CheckpointStats.ckpt_segs_added is only >> incremented in PreallocXlogFiles(). Which has the following comment: >> * XXX this is currently extremely conservative, since it forces only one >> * future log segment to exist, and even that only if we are 75% done with >> * the current one. This is only appropriate for very low-WAL-volume systems. >> >> Whereas in real workloads WAL files are almost exclusively created via >> XLogWrite()->XLogFileInit(). >> >> I think we should consider just removing that field. Or, even better, show >> something accurate instead. > > +1 to show something accurate instead. > > It's also worth showing them in monitoring stats view like pg_stat_wal? +1. I was confused by this when working on a WAL pre-allocation patch [0]. Perhaps it could be replaced by a new parameter and a new field in pg_stat_wal. How about something like log_wal_init_interval, where the value is the minimum amount of time between reporting the number of WAL segments created since the last report? Nathan [0] https://www.postgresql.org/message-id/flat/20201225200953.jjkrytlrzojbndh5@alap3.anarazel.de
Re: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness
From
Fujii Masao
Date:
On 2021/07/27 5:27, Bossart, Nathan wrote: > +1. I was confused by this when working on a WAL pre-allocation > patch [0]. Perhaps it could be replaced by a new parameter and a new > field in pg_stat_wal. How about something like log_wal_init_interval, > where the value is the minimum amount of time between reporting the > number of WAL segments created since the last report? You mean to introduce new GUC like log_wal_init_interval and that the number of WAL files created since the last report will be logged every that interval? But isn't it better and simpler to just expose the accumulated number of WAL files created, in pg_stat_wal view or elsewhere? If so, we can easily get to know the number of WAL files created in every interval by checking the view and calculating the diff. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness
From
"Bossart, Nathan"
Date:
On 7/26/21, 5:23 PM, "Fujii Masao" <masao.fujii@oss.nttdata.com> wrote: > On 2021/07/27 5:27, Bossart, Nathan wrote: >> +1. I was confused by this when working on a WAL pre-allocation >> patch [0]. Perhaps it could be replaced by a new parameter and a new >> field in pg_stat_wal. How about something like log_wal_init_interval, >> where the value is the minimum amount of time between reporting the >> number of WAL segments created since the last report? > > You mean to introduce new GUC like log_wal_init_interval and that > the number of WAL files created since the last report will be logged > every that interval? But isn't it better and simpler to just expose > the accumulated number of WAL files created, in pg_stat_wal view > or elsewhere? If so, we can easily get to know the number of WAL files > created in every interval by checking the view and calculating the diff. I agree with you about adding a new field to pg_stat_wal. The parameter would just be a convenient way of logging this information for future reference. I don't feel strongly about the parameter if you think the pg_stat_wal addition is enough. Nathan
Re: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness
From
Andres Freund
Date:
Hi, On 2021-07-26 20:27:21 +0000, Bossart, Nathan wrote: > +1. I was confused by this when working on a WAL pre-allocation > patch [0]. Perhaps it could be replaced by a new parameter and a new > field in pg_stat_wal. How about something like log_wal_init_interval, > where the value is the minimum amount of time between reporting the > number of WAL segments created since the last report? Why not just make the number in log_checkpoints accurate? There's no point in the current number, so we don't need to preserve it... Greetings, Andres Freund
Re: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness
From
Andres Freund
Date:
Hi, On 2021-07-25 12:10:07 +0900, Fujii Masao wrote: > It's also worth showing them in monitoring stats view like pg_stat_wal? I'm not convinced that's all that meaningful. It makes sense to include it as part of the checkpoint output, because checkpoints determine when WAL can be recycled etc. It's not that clear to me how to represent that as part of pg_stat_wal? I guess we could add columns for the amount of WAL has been a) newly created b) recycled c) removed. In combination that *does* seem useful. But also a mostly independent change... Greetings, Andres Freund
Re: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness
From
"Bossart, Nathan"
Date:
On 7/26/21, 5:48 PM, "Andres Freund" <andres@anarazel.de> wrote: > On 2021-07-26 20:27:21 +0000, Bossart, Nathan wrote: >> +1. I was confused by this when working on a WAL pre-allocation >> patch [0]. Perhaps it could be replaced by a new parameter and a new >> field in pg_stat_wal. How about something like log_wal_init_interval, >> where the value is the minimum amount of time between reporting the >> number of WAL segments created since the last report? > > Why not just make the number in log_checkpoints accurate? There's no > point in the current number, so we don't need to preserve it... My understanding is that the statistics logged for log_checkpoints are just for that specific checkpoint. From that angle, the value for the number of WAL files added is technically correct. Checkpoints will only ever create zero or one new files via PreallocXlogFiles(). If we also added all the segments created outside of the checkpoint, the value for "added" would go from meaning "WAL files created by this checkpoint" to "WAL files creates since the last checkpoint." That's probably less confusing than what's there today, but it's still slightly different than all the other log_checkpoints stats. Nathan