Re: wal stats questions - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: wal stats questions
Date
Msg-id 2edded96-5e8b-987c-c01a-16e1542dfec9@oss.nttdata.com
Whole thread Raw
In response to Re: wal stats questions  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: wal stats questions  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
On 2021/03/30 17:28, Kyotaro Horiguchi wrote:
> At Tue, 30 Mar 2021 09:41:24 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in 
>> I update the patch since there were my misunderstanding points.
>>
>> On 2021/03/26 16:20, Masahiro Ikeda wrote:
>>> Thanks for many your suggestions!
>>> I made the patch to handle the issues.
>>>
>>>> 1) What is the motivation to have both prevWalUsage and pgWalUsage,
>>>>    instead of just accumulating the stats since the last submission?
>>>>    There doesn't seem to be any comment explaining it? Computing
>>>>    differences to previous values, and copying to prev*, isn't free. I
>>>>    assume this is out of a fear that the stats could get reset before
>>>>    they're used for instrument.c purposes - but who knows?
>>>
>>> I removed the unnecessary code copying pgWalUsage and just reset the
>>> pgWalUsage after reporting the stats in pgstat_report_wal().
>>
>> I didn't change this.
>>
>>>> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the
>>>>    former being used by wal writer, the latter by most other processes?
>>>>    There again don't seem to be comments explaining this.
>>>
>>> I added the comments why two functions are separated.
>>> (But is it better to merge them?)
>>
>> I updated the comments for following reasons.
>>
>>
>>>> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
>>>>    just to figure out if there's been any changes isn't all that
>>>>    cheap. This is regularly exercised in read-only workloads too. Seems
>>>>    adding a boolean WalStats.have_pending = true or such would be
>>>>    better.
>>>> 4) For plain backends pgstat_report_wal() is called by
>>>>    pgstat_report_stat() - but it is not checked as part of the "Don't
>>>>    expend a clock check if nothing to do" check at the top.  It's
>>>>    probably rare to have pending wal stats without also passing one of
>>>>    the other conditions, but ...
>>>
>>> I added the logic to check if the stats counters are updated or not in
>>> pgstat_report_stat() using not only generated wal record but also write/sync
>>> counters, and it can skip to call reporting function.
>>
>> I removed the checking code whether the wal stats counters were updated or not
>> in pgstat_report_stat() since I couldn't understand the valid case yet.
>> pgstat_report_stat() is called by backends when the transaction is finished.
>> This means that the condition seems always pass.
> 
> Doesn't the same holds for all other counters?  If you are saying that
> "wal counters should be zero if all other stats counters are zero", we
> need an assertion to check that and a comment to explain that.
> 
> I personally find it safer to add the WAL-stats condition to the
> fast-return check, rather than addin such assertion.
Thanks for your comments.

OK, I added the condition to the fast-return check. I noticed that I
misunderstood that the purpose is to avoid expanding a clock check using WAL
stats counters. But, the purpose is to make the conditions stricter, right?


> pgstat_send_wal() is called mainly from pgstat_report_wal() which
> consumes pgWalStats counters and WalWriterMain() which
> doesn't. Checking on pgWalStats counters isn't so complex that we need
> to avoid that in wal writer, thus *I* think pgstat_send_wal() and
> pgstat_report_wal() can be consolidated.  Even if you have a strong
> opinion that wal writer should call a separate function, the name
> should be something other than pgstat_send_wal() since it ignores
> pgWalUsage counters, which are supposed to be included in "WAL stats".

OK, I consolidated them.



>> I thought to implement if the WAL stats counters were not updated, skip to
>> send all statistics including the counters for databases and so on. But I
>> think it's not good because it may take more time to be reflected the
>> generated stats by read-only transaction.
> 
> Ur, yep.
> 
>>> Although I added the condition which the write/sync counters are updated or
>>> not, I haven't understood the following case yet...Sorry. I checked related
>>> code and tested to insert large object, but I couldn't. I'll investigate more
>>> deeply, but if you already know the function which leads the following case,
>>> please let me know.
>>
>> I understood the above case (Fujii-san, thanks for your advice in person).
>> When to flush buffers, for example, to select buffer replacement victim,
>> it requires a WAL flush if the buffer is dirty. So, to check the WAL stats
>> counters are updated or not, I check the number of generated wal record and
>> the counter of syncing in pgstat_report_wal().
>>
>> The reason why not to check the counter of writing is that if to write is
>> happened, to sync is happened too in the above case. I added the comments in
>> the patch.
> 
> Mmm..  Although I couldn't read you clearly, The fact that a flush may
> happen without a write means the reverse at the same time, a write may
> happen without a flush.  For asynchronous commits, WAL-write happens
> alone unaccompanied by a flush.  And the corresponding flush would
> happen later without writes.

Sorry, I didn't explain it enough.

For processes which may generate WAL records like backends, I thought it's
enough to check (1)the number of generated WAL records and (2)the counters of
syncing(flushing) the WAL. This is checked in pgstat_report_wal(). Sorry for
that I didn't mention (1) in the above thread.

If a backend execute a write transaction, some WAL records must be generated.
So, it's ok to check (1) only regardless of whether asynchronous commit is
enabled or not.

OHOT, if a backend execute a read-only transaction, WAL records won't be
generated (although HOT makes a wal records exactly...). But, WAL-write and
flush may happen when to flush buffers via XLogFlush(). In this case, if
WAL-write happened, flush must be happen later. But, if my understanding is
correct, there is no a case to flush doesn't happen, but to write happen.
So, I thought (2) is needed and it's enough to check the counter of
syncing(flushing).


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Attachment

pgsql-hackers by date:

Previous
From: Markus Wanner
Date:
Subject: Re: [PATCH] add concurrent_abort callback for output plugin
Next
From: Etsuro Fujita
Date:
Subject: Re: Asynchronous Append on postgres_fdw nodes.