Re: New statistics for tuning WAL buffer size - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: New statistics for tuning WAL buffer size |
Date | |
Msg-id | 5f7bcbee-c072-c476-32b7-f5bea1a5e201@oss.nttdata.com Whole thread Raw |
In response to | Re: New statistics for tuning WAL buffer size (Masahiro Ikeda <ikedamsh@oss.nttdata.com>) |
Responses |
Re: New statistics for tuning WAL buffer size
|
List | pgsql-hackers |
On 2020/09/25 12:06, Masahiro Ikeda wrote: > On 2020-09-18 11:11, Kyotaro Horiguchi wrote: >> At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda >> <ikedamsh@oss.nttdata.com> wrote in >>> Thanks. I confirmed that it causes HOT pruning or killing of >>> dead index tuple if DecodeCommit() is called. >>> >>> As you said, DecodeCommit() may access the system table. >> ... >>> The wals are generated only when logical replication is performed. >>> So, I added pgstat_send_wal() in XLogSendLogical(). >>> >>> But, I concerned that it causes poor performance >>> since pgstat_send_wal() is called per wal record, >> >> I think that's too frequent. If we want to send any stats to the >> collector, it is usually done at commit time using >> pgstat_report_stat(), and the function avoids sending stats too >> frequently. For logrep-worker, apply_handle_commit() is calling it. It >> seems to be the place if we want to send the wal stats. Or it may be >> better to call pgstat_send_wal() via pgstat_report_stat(), like >> pg_stat_slru(). > > Thanks for your comments. > Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it, > the frequency to send statistics is not so high. On second thought, it's strange to include this change in pg_stat_wal patch. Because pgstat_report_stat() sends various stats and that change would affect not only pg_stat_wal but also other stats views. That is, if we really want to make some processes call pgstat_report_stat() newly, which should be implemented as a separate patch. But I'm not sure how useful this change is because probably the stats are almost negligibly small in those processes. This thought seems valid for pgstat_send_wal(). I changed the thought and am inclined to be ok not to call pgstat_send_wal() in some background processes that are very unlikely to generate WAL. For example, logical-rep launcher, logical-rep walsender, and autovacuum launcher. Thought? > >> Currently logrep-laucher, logrep-worker and autovac-launcher (and some >> other processes?) don't seem (AFAICS) sending scan stats at all but >> according to the discussion here, we should let such processes send >> stats. > > I added pgstat_report_stat() to logrep-laucher and autovac-launcher. > As you said, logrep-worker already calls apply_handle_commit() and pgstat_report_stat(). Right. > The checkpointer doesn't seem to call pgstat_report_stat() currently, > but since there is a possibility to send wal statistics, I added pgstat_report_stat(). IMO it's better to call pgstat_send_wal() in the checkpointer, instead, because of the above reason. Thanks for updating the patch! I'd like to share my review comments. + <xref linkend="monitoring-pg-stat-wal-view"/> for details. Like the description for pg_stat_bgwriter, <link> tag should be used instead of <xref>. + <para> + Number of WAL writes when the <xref linkend="guc-wal-buffers"/> are full + </para></entry> I prefer the following description. Thought? "Number of times WAL data was written to the disk because wal_buffers got full" + the <structname>pg_stat_archiver</structname> view ,or <literal>wal</literal> A comma should be just after "view" (not just before "or"). +/* + * WAL global statistics counter. + * This counter is incremented by both each backend and background. + * And then, sent to the stat collector process. + */ +PgStat_MsgWal WalStats; What about merging the comments for BgWriterStats and WalStats into one because they are almost the same? For example, ------------------------------- /* * BgWriter and WAL global statistics counters. * Stored directly in a stats message structure so they can be sent * without needing to copy things around. We assume these init to zeroes. */ PgStat_MsgBgWriter BgWriterStats; PgStat_MsgWal WalStats; ------------------------------- BTW, originally there was the comment "(unused in other processes)" for BgWriterStats. But it seems not true, so I removed it from the above example. + rc = fwrite(&walStats, sizeof(walStats), 1, fpout); + (void) rc; /* we'll check for error with ferror */ Since the patch changes the pgstat file format, PGSTAT_FILE_FORMAT_ID should also be changed? - * Clear out global and archiver statistics so they start from zero in + * Clear out global, archiver and wal statistics so they start from zero in This is not the issue of this patch, but isn't it better to mention also SLRU stats here? That is, what about "Clear out global, archiver, WAL and SLRU statistics so they start from zero in"? I found "wal statistics" and "wal stats" in some comments in the patch, but isn't it better to use "WAL statistics" and "WAL stats", instead, if there is no special reason to use lowercase? + /* + * Read wal stats struct + */ + if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats)) In pgstat_read_db_statsfile_timestamp(), the local variable myWalStats should be declared and be used to store the WAL stats read via fread(), instead. +{ oid => '1136', descr => 'statistics: number of WAL writes when the wal buffers are full', If we change the description of wal_buffers_full column in the document as I proposed, we should also use the proposed description here. +{ oid => '1137', descr => 'statistics: last reset for the walwriter', "the walwriter" should be "WAL" or "WAL activity", etc? + * PgStat_MsgWal Sent by each backend and background workers to update WAL statistics. If your intention here is to mention background processes like checkpointer, "each backend and background workers" should be "backends and background processes"? + PgStat_Counter m_wal_buffers_full; /* number of WAL write caused by full of WAL buffers */ I don't think this comment is necessary. + PgStat_Counter wal_buffers_full; /* number of WAL write caused by full of WAL buffers */ + TimestampTz stat_reset_timestamp; /* last time when the stats reset */ I don't think these comments are necessary. +/* + * WAL writes statistics counter is updated by backend and background workers Same as above. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
pgsql-hackers by date: