Re: New statistics for tuning WAL buffer size - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: New statistics for tuning WAL buffer size
Date
Msg-id 5f7bcbee-c072-c476-32b7-f5bea1a5e201@oss.nttdata.com
Whole thread Raw
In response to Re: New statistics for tuning WAL buffer size  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
Responses Re: New statistics for tuning WAL buffer size  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers

On 2020/09/25 12:06, Masahiro Ikeda wrote:
> On 2020-09-18 11:11, Kyotaro Horiguchi wrote:
>> At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda
>> <ikedamsh@oss.nttdata.com> wrote in
>>> Thanks. I confirmed that it causes HOT pruning or killing of
>>> dead index tuple if DecodeCommit() is called.
>>>
>>> As you said, DecodeCommit() may access the system table.
>> ...
>>> The wals are generated only when logical replication is performed.
>>> So, I added pgstat_send_wal() in XLogSendLogical().
>>>
>>> But, I concerned that it causes poor performance
>>> since pgstat_send_wal() is called per wal record,
>>
>> I think that's too frequent.  If we want to send any stats to the
>> collector, it is usually done at commit time using
>> pgstat_report_stat(), and the function avoids sending stats too
>> frequently. For logrep-worker, apply_handle_commit() is calling it. It
>> seems to be the place if we want to send the wal stats.  Or it may be
>> better to call pgstat_send_wal() via pgstat_report_stat(), like
>> pg_stat_slru().
> 
> Thanks for your comments.
> Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it,
> the frequency to send statistics is not so high.

On second thought, it's strange to include this change in pg_stat_wal patch.
Because pgstat_report_stat() sends various stats and that change would
affect not only pg_stat_wal but also other stats views. That is, if we really
want to make some processes call pgstat_report_stat() newly, which
should be implemented as a separate patch. But I'm not sure how useful
this change is because probably the stats are almost negligibly small
in those processes.

This thought seems valid for pgstat_send_wal(). I changed the thought
and am inclined to be ok not to call pgstat_send_wal() in some background
processes that are very unlikely to generate WAL. For example, logical-rep
launcher, logical-rep walsender, and autovacuum launcher. Thought?


> 
>> Currently logrep-laucher, logrep-worker and autovac-launcher (and some
>> other processes?) don't seem (AFAICS) sending scan stats at all but
>> according to the discussion here, we should let such processes send
>> stats.
> 
> I added pgstat_report_stat() to logrep-laucher and autovac-launcher.
> As you said, logrep-worker already calls apply_handle_commit() and pgstat_report_stat().

Right.


> The checkpointer doesn't seem to call pgstat_report_stat() currently,
> but since there is a possibility to send wal statistics, I added pgstat_report_stat().

IMO it's better to call pgstat_send_wal() in the checkpointer, instead,
because of the above reason.

Thanks for updating the patch! I'd like to share my review comments.

+       <xref linkend="monitoring-pg-stat-wal-view"/> for details.

Like the description for pg_stat_bgwriter, <link> tag should be used
instead of <xref>.


+      <para>
+       Number of WAL writes when the <xref linkend="guc-wal-buffers"/> are full
+      </para></entry>

I prefer the following description. Thought?

"Number of times WAL data was written to the disk because wal_buffers got full"


+        the <structname>pg_stat_archiver</structname> view ,or <literal>wal</literal>

A comma should be just after "view" (not just before "or").


+/*
+ * WAL global statistics counter.
+ * This counter is incremented by both each backend and background.
+ * And then, sent to the stat collector process.
+ */
+PgStat_MsgWal WalStats;

What about merging the comments for BgWriterStats and WalStats into one because they are almost the same? For example,

-------------------------------
/*
  * BgWriter and WAL global statistics counters.
  * Stored directly in a stats message structure so they can be sent
  * without needing to copy things around.  We assume these init to zeroes.
  */
PgStat_MsgBgWriter BgWriterStats;
PgStat_MsgWal WalStats;
-------------------------------

BTW, originally there was the comment "(unused in other processes)"
for BgWriterStats. But it seems not true, so I removed it from
the above example.


+    rc = fwrite(&walStats, sizeof(walStats), 1, fpout);
+    (void) rc;                    /* we'll check for error with ferror */

Since the patch changes the pgstat file format,
PGSTAT_FILE_FORMAT_ID should also be changed?


-     * Clear out global and archiver statistics so they start from zero in
+     * Clear out global, archiver and wal statistics so they start from zero in

This is not the issue of this patch, but isn't it better to mention
also SLRU stats here? That is, what about "Clear out global, archiver,
WAL and SLRU statistics so they start from zero in"?


I found "wal statistics" and "wal stats" in some comments in the patch,
but isn't it better to use "WAL statistics" and "WAL stats", instead,
if there is no special reason to use lowercase?


+    /*
+     * Read wal stats struct
+     */
+    if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))

In pgstat_read_db_statsfile_timestamp(), the local variable myWalStats
should be declared and be used to store the WAL stats read via fread(),
instead.


+{ oid => '1136', descr => 'statistics: number of WAL writes when the wal buffers are full',

If we change the description of wal_buffers_full column in the document
as I proposed, we should also use the proposed description here.


+{ oid => '1137', descr => 'statistics: last reset for the walwriter',

"the walwriter" should be "WAL" or "WAL activity", etc?


+ * PgStat_MsgWal            Sent by each backend and background workers to update WAL statistics.

If your intention here is to mention background processes like checkpointer,
"each backend and background workers" should be "backends and background
processes"?


+    PgStat_Counter m_wal_buffers_full;    /* number of WAL write caused by full of WAL buffers */

I don't think this comment is necessary.


+    PgStat_Counter wal_buffers_full;    /* number of WAL write caused by full of WAL buffers */
+    TimestampTz stat_reset_timestamp;    /* last time when the stats reset */

I don't think these comments are necessary.


+/*
+ * WAL writes statistics counter is updated by backend and background workers

Same as above.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: gs_group_1 crashing on 13beta2/s390x
Next
From: Ranier Vilela
Date:
Subject: Re: gs_group_1 crashing on 13beta2/s390x