Re: New statistics for tuning WAL buffer size - Mailing list pgsql-hackers
From | Masahiro Ikeda |
---|---|
Subject | Re: New statistics for tuning WAL buffer size |
Date | |
Msg-id | 6a49277f4eced3e9c67a1a14710f6426@oss.nttdata.com Whole thread Raw |
In response to | Re: New statistics for tuning WAL buffer size (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: New statistics for tuning WAL buffer size
|
List | pgsql-hackers |
On 2020-09-26 19:18, Amit Kapila wrote: > On Fri, Sep 25, 2020 at 11:06 PM Fujii Masao > <masao.fujii@oss.nttdata.com> wrote: >> >> On 2020/09/25 12:06, Masahiro Ikeda wrote: >> > On 2020-09-18 11:11, Kyotaro Horiguchi wrote: >> >> At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda >> >> <ikedamsh@oss.nttdata.com> wrote in >> >>> Thanks. I confirmed that it causes HOT pruning or killing of >> >>> dead index tuple if DecodeCommit() is called. >> >>> >> >>> As you said, DecodeCommit() may access the system table. >> >> ... >> >>> The wals are generated only when logical replication is performed. >> >>> So, I added pgstat_send_wal() in XLogSendLogical(). >> >>> >> >>> But, I concerned that it causes poor performance >> >>> since pgstat_send_wal() is called per wal record, >> >> >> >> I think that's too frequent. If we want to send any stats to the >> >> collector, it is usually done at commit time using >> >> pgstat_report_stat(), and the function avoids sending stats too >> >> frequently. For logrep-worker, apply_handle_commit() is calling it. It >> >> seems to be the place if we want to send the wal stats. Or it may be >> >> better to call pgstat_send_wal() via pgstat_report_stat(), like >> >> pg_stat_slru(). >> > >> > Thanks for your comments. >> > Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it, >> > the frequency to send statistics is not so high. >> >> On second thought, it's strange to include this change in pg_stat_wal >> patch. >> Because pgstat_report_stat() sends various stats and that change would >> affect not only pg_stat_wal but also other stats views. That is, if we >> really >> want to make some processes call pgstat_report_stat() newly, which >> should be implemented as a separate patch. But I'm not sure how useful >> this change is because probably the stats are almost negligibly small >> in those processes. >> >> This thought seems valid for pgstat_send_wal(). I changed the thought >> and am inclined to be ok not to call pgstat_send_wal() in some >> background >> processes that are very unlikely to generate WAL. >> OK, I removed to pgstat_report_stat() for autovaccum launcher, logrep-worker and logrep-launcher. > This makes sense to me. I think even if such background processes have > to write WAL due to wal_buffers, it will be accounted next time the > backend sends the stats. Thanks for your comments. IIUC, since each process counts WalStats.m_wal_buffers_full, backend can't send the counter which other background processes have to write WAL due to wal_buffers. Although we can't track all WAL activity, the impact on the statistics is minimal so we can ignore it. > One minor point, don't we need to reset the counter > WalStats.m_wal_buffers_full once we sent the stats, otherwise the same > stats will be accounted multiple times. Now, the counter is reset in pgstat_send_wal. Isn't it enough? >> The checkpointer doesn't seem to call pgstat_report_stat() currently, >> but since there is a possibility to send wal statistics, I added >> pgstat_report_stat(). > > IMO it's better to call pgstat_send_wal() in the checkpointer, instead, > because of the above reason. Ok, I changed. > Thanks for updating the patch! I'd like to share my review comments. > > + <xref linkend="monitoring-pg-stat-wal-view"/> for details. > > Like the description for pg_stat_bgwriter, <link> tag should be used > instead of <xref>. Thanks, fixed. > + <para> > + Number of WAL writes when the <xref linkend="guc-wal-buffers"/> > are full > + </para></entry> > > I prefer the following description. Thought? > > "Number of times WAL data was written to the disk because wal_buffers > got full" Ok, I changed. > + the <structname>pg_stat_archiver</structname> view ,or > <literal>wal</literal> > > A comma should be just after "view" (not just before "or"). Sorry, anyway I think a comma is not necessary. I removed it. > +/* > + * WAL global statistics counter. > + * This counter is incremented by both each backend and background. > + * And then, sent to the stat collector process. > + */ > +PgStat_MsgWal WalStats; > > What about merging the comments for BgWriterStats and WalStats into > one because they are almost the same? For example, > > ------------------------------- > /* > * BgWriter and WAL global statistics counters. > * Stored directly in a stats message structure so they can be sent > * without needing to copy things around. We assume these init to > zeroes. > */ > PgStat_MsgBgWriter BgWriterStats; > PgStat_MsgWal WalStats; > ------------------------------- > > BTW, originally there was the comment "(unused in other processes)" > for BgWriterStats. But it seems not true, so I removed it from > the above example. Thanks, I changed. > + rc = fwrite(&walStats, sizeof(walStats), 1, fpout); > + (void) rc; /* we'll check for error with ferror */ > > Since the patch changes the pgstat file format, > PGSTAT_FILE_FORMAT_ID should also be changed? Sorry about that. I incremented PGSTAT_FILE_FORMAT_ID by +1. > - * Clear out global and archiver statistics so they start from zero > in > + * Clear out global, archiver and wal statistics so they start from > zero in > > This is not the issue of this patch, but isn't it better to mention > also SLRU stats here? That is, what about "Clear out global, archiver, > WAL and SLRU statistics so they start from zero in"? Thanks, I changed. > I found "wal statistics" and "wal stats" in some comments in the patch, > but isn't it better to use "WAL statistics" and "WAL stats", instead, > if there is no special reason to use lowercase? OK. I fixed it. > + /* > + * Read wal stats struct > + */ > + if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats)) > > In pgstat_read_db_statsfile_timestamp(), the local variable myWalStats > should be declared and be used to store the WAL stats read via fread(), > instead. Thanks, I changed it to declare myWalStats. > +{ oid => '1136', descr => 'statistics: number of WAL writes when the > wal buffers are full', > > If we change the description of wal_buffers_full column in the document > as I proposed, we should also use the proposed description here. OK, I fixed it. > +{ oid => '1137', descr => 'statistics: last reset for the walwriter', > > "the walwriter" should be "WAL" or "WAL activity", etc? Thanks, I fixed it. > + * PgStat_MsgWal Sent by each backend and background workers to > update WAL statistics. > > If your intention here is to mention background processes like > checkpointer, > "each backend and background workers" should be "backends and > background > processes"? Thanks, I fixed it. > + PgStat_Counter m_wal_buffers_full; /* number of WAL write caused by > full of WAL buffers */ > > I don't think this comment is necessary. OK, I removed. > + PgStat_Counter wal_buffers_full; /* number of WAL write caused by > full of WAL buffers */ > + TimestampTz stat_reset_timestamp; /* last time when the stats reset > */ > > I don't think these comments are necessary. OK, I removed > +/* > + * WAL writes statistics counter is updated by backend and background > workers > > Same as above. I fixed it. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
pgsql-hackers by date: