Re: About to add WAL write/fsync statistics to pg_stat_wal view - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: About to add WAL write/fsync statistics to pg_stat_wal view
Date
Msg-id 395f109540835fabe76dbd9d45733a14@oss.nttdata.com
Whole thread Raw
In response to Re: About to add WAL write/fsync statistics to pg_stat_wal view  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: About to add WAL write/fsync statistics to pg_stat_wal view  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-hackers
Hi, David.

Thanks for your comments.

On 2021-01-26 08:48, David G. Johnston wrote:
> On Mon, Jan 25, 2021 at 8:03 AM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> 
>> On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda
>> <ikedamsh@oss.nttdata.com> wrote:
>>> 
>>> Hi, thanks for the reviews.
>>> 
>>> I updated the attached patch.
>> 
>> Thank you for updating the patch!
> 
> Your original email with "total number of times" is more correct,
> removing the "of times" and just writing "total number of WAL" is not
> good wording.
> 
> Specifically, this change is strictly worse than the original.
> 
> -       Number of times WAL data was written to disk because WAL
> buffers became full
> +       Total number of WAL data written to disk because WAL buffers
> became full
> 
> Both have the flaw that they leave implied exactly what it means to
> "write WAL to disk".  It is also unclear whether a counter, bytes, or
> both, would be more useful here. I've incorporated this into my
> documentation suggestions below:
> (wal_buffers_full)
> 
> -- Revert - the original was better, though maybe add more detail
> similar to the below.  I didn't research exactly how this works.

OK, I understood.
I reverted since this is a counter statistics.


> (wal_write)
> The number of times WAL buffers were written out to disk via XLogWrite
> 

Thanks.

I thought it's better to omit "The" and "XLogWrite" because other views' 
description
omits "The" and there is no description of "XlogWrite" in the documents. 
What do you think?

> -- Seems like this should have a bytes version too

Do you mean that we need to separate statistics for wal write?


> (wal_write_time)
> The amount of time spent writing WAL buffers to disk, excluding sync
> time unless the wal_sync_method is either open_datasync or open_sync.
> Units are in milliseconds with microsecond resolution.  This is zero
> when track_wal_io_timing is disabled.

Thanks, I'll fix it.


> (wal_sync)
> The number of times WAL files were synced to disk while
> wal_sync_method was set to one of the "sync at commit" options (i.e.,
> fdatasync, fsync, or fsync_writethrough).

Thanks, I'll fix it.


> -- it is not going to be zero just because those settings are
> presently disabled as they could have been enabled at some point since
> the last time these statistics were reset.

Right, your description is correct.
The "track_wal_io_timing" has the same limitation, doesn't it?


> (wal_sync_time)
> The amount of time spent syncing WAL files to disk, in milliseconds
> with microsecond resolution.  This requires setting wal_sync_method to
> one of the "sync at commit" options (i.e., fdatasync, fsync, or
> fsync_writethrough).

Thanks, I'll fix it.
I will add the comments related to "track_wal_io_timing".


> Also,
> 
> I would suggest extracting the changes to postmaster/pgstat.c and
> replication/walreceiver.c to a separate patch as you've fundamentally
> changed how it behaves with regards to that function and how it
> interacts with the WAL receiver.  That seems an entirely separate
> topic warranting its own patch and discussion.

OK, I will separate two patches.


On 2021-01-26 08:52, David G. Johnston wrote:
> On Mon, Jan 25, 2021 at 4:37 PM Masahiro Ikeda
> <ikedamsh@oss.nttdata.com> wrote:
> 
>> I agree with your comments. I think it should report when
>> reaching the end of WAL too. I add the code to report the stats
>> when finishing the current WAL segment file when timeout in the
>> main loop and when reaching the end of WAL.
> 
> The following is not an improvement:
> 
> - /* Send WAL statistics to the stats collector. */+ /* Send WAL
> statistics to stats collector */
> 
> The word "the" there makes it proper English.  Your copy-pasting
> should have kept the existing good wording in the other locations
> rather than replace the existing location with the newly added
> incorrect wording.

Thanks, I'll fix it.


> This doesn't make sense:
> 
> * current WAL segment file to avoid loading stats collector.
> 
> Maybe "overloading" or "overwhelming"?
> 
> I see you removed the pgstat_send_wal(force) change.  The rest of my
> comments on the v6 patch still stand I believe.

Yes, "overloading" is right. Thanks.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Tid scan improvements
Next
From: Tom Lane
Date:
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit