Re: [Proposal] Add accumulated statistics for wait event - Mailing list pgsql-hackers

From Atsushi Torikoshi
Subject Re: [Proposal] Add accumulated statistics for wait event
Date
Msg-id CACZ0uYHi0bjg6+EFhA=BM+c14rbu4tj5v+EyVJ=GY1kRjqqASw@mail.gmail.com
Whole thread Raw
In response to RE: [Proposal] Add accumulated statistics for wait event  ("imai.yoshikazu@fujitsu.com" <imai.yoshikazu@fujitsu.com>)
List pgsql-hackers
Hi Imai-san,

I feel your 'pg_stat_waitaccum' will help us investigate the bottleneck.
So I'd like to do some benchmarks but unfortunately, the latest v6 patch
couldn't be applied to HEAD anymore.

Is it possible to share the latest patches?
If not, I'll make v6 applicable to the HEAD.

Regards,

--
Atsushi Torikoshi

On Fri, Feb 28, 2020 at 5:17 PM imai.yoshikazu@fujitsu.com <imai.yoshikazu@fujitsu.com> wrote:
On Wed, Feb 26, 2020 at 1:39 AM, Kyotaro Horiguchi wrote:
> Hello.  I had a brief look on this and have some comments on this.

Hi, Horiguchi-san. Thank you for looking at this!

> It uses its own hash implement. Aside from the appropriateness of
> having another implement of existing tool, in the first place hash
> works well for wide, sparse and uncertain set of keys. But they are in
> rather a dense and narrow set with certain and fixed members. It
> registers more than 200 entries but bucket size is 461. It would be
> needed to avoid colliisions, but seems a bit too wasting.

Yes, wait events are grouped and wait events ID are defined as a sequential
number, starting from specified number for each group(like 0x01000000U), thus
keys are in a dense and narrow set.

=====
#define PG_WAIT_LWLOCK              0x01000000U
#define PG_WAIT_LOCK                0x03000000U
#define PG_WAIT_BUFFER_PIN          0x04000000U
#define PG_WAIT_ACTIVITY            0x05000000U
...

typedef enum
{
    WAIT_EVENT_ARCHIVER_MAIN = PG_WAIT_ACTIVITY,
    WAIT_EVENT_AUTOVACUUM_MAIN,
    ...
    WAIT_EVENT_WAL_WRITER_MAIN
} WaitEventActivity;
=====

The number 461 is the lowest number which avoids collisions in the hash for
current wait events set. As you pointed out, there are a bit too much unused
entries.


If we prepare arrays for each wait classes with appropriate size which just
fits the number of each wait event class, we can store wait event statistics
into those arrays with no more wastes.
Also we calculate hash number by "(wait event ID) % (bucket size)" in hash
case, while we calculate index of arrays by "(wait event ID) - (wait event class id)"
in array case. The latter one's cost is lower than the former one.

Currently I implement wait event statistics store by hash because of its
easiness of implementation, but I think it is good to implement it by array
for above reasons. One concern is that we put restrictions on wait events
that it must be defined as it is sequenced in its wait class so that there
are no unused entries in the array.


> It seems trying to avoid doing needless work when the feature is not
> activated by checking "if (wa_hash==NULL)", but the hash is created
> being filled with possible entries regardless of whether
> track_wait_timing is on or off.

This might be bad implementation but there are the cases "wa_hash = NULL"
where pgstat_init() is not called like when executing "initdb". I insert
that check for avoiding unexpected crash in those cases.

I also noticed debug codes exist around that code...I will modify it.

> As the result
> pgstat_report_waitaccum_end calls pgstat_get_wa_entry tremendously
> frequently. This should be the reason for the recent benchmark result.
> I'm not sure such frequency of hash-searching is acceptable even if
> the feature is turned on.

track_wait_timing parameter determines whether we collect wait time.
Regardless of that parameter, we always collect wait count every wait event
happens. I think calling pgstat_get_wa_entry frequently is not critical to
performance. From pavel's benchmark results, if track_wait_timing parameter
is off, there are +/-1.0% performance difference between patched and unpatched
and it is just considered as a measurement error.


Thanks
--
Yoshikazu Imai


pgsql-hackers by date:

Previous
From: Masahiro Ikeda
Date:
Subject: Why are wait events not reported even though it reads/writes atimeline history file?
Next
From: Andy Fan
Date:
Subject: Re: Trying to pull up EXPR SubLinks