RE: [Proposal] Add accumulated statistics - Mailing list pgsql-hackers

From imai.yoshikazu@fujitsu.com
Subject RE: [Proposal] Add accumulated statistics
Date
Msg-id OSAPR01MB46097457DD12DF85A3C42F4794600@OSAPR01MB4609.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: [Proposal] Add accumulated statistics  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Responses RE: [Proposal] Add accumulated statistics  ("imai.yoshikazu@fujitsu.com" <imai.yoshikazu@fujitsu.com>)
List pgsql-hackers
Hi,

On Tue, Jan 15, 2019 at 1:14 AM, Tsunakawa, Takayuki wrote:

[ ... absent for a long time ]

I read the discussions of this thread.

If we want wait event info, we can currently do sampling from pg_stat_activity and get pseudo wait event total
duration.
(I understand wait event sampling does *not* represent wait event count but wait event duration.)

It is said that, in most cases, sampling is enough to find the cause of issue. On the other hand, it is said that
samplingmissed wait events that don't spent much time in the process. Can we say we don't need such wait events that
seemsapparently not important?
 

Also we can know the pseudo total duration from the sampling though, we can't know the how many times wait events
occurredand how long wait events take time in the average each time. I think it would be better to also know wait event
averagetime to investigate the cause of the issue.
 

For example, counts, total duration and average times of each wait events after executing pgbench, which is gotten from
theserver applied my patch, is like below.
 


transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100 
query mode: simple
number of clients: 8
number of threads: 8
duration: 60 s
number of transactions actually processed: 132259
latency average = 3.630 ms
tps = 2203.909727 (including connections establishing)
tps = 2204.248691 (excluding connections establishing)

 wait_event_type |      wait_event      | calls  |   times   | times/calls
-----------------+----------------------+--------+-----------+------------
 Client          | ClientRead           | 925732 | 115270950 |  124.52
 IO              | DataFileRead         | 220357 |   3219893 |   14.61
 LWLock          | WALWriteLock         | 195560 | 192684987 |  985.30
 IO              | DataFileWrite        | 156208 |   4440091 |   28.42
 IO              | WALWrite             |  51035 |   2925808 |   57.33
 IO              | WALSync              |  50812 |  49626240 |  976.66
 Lock            | transactionid        |   6179 |   9905727 | 1603.13
 LWLock          | buffer_content       |   5453 |    295123 |   54.12
 IO              | DataFileExtend       |   2367 |     90430 |   38.20
 LWLock          | wal_insert           |   2053 |     33620 |   16.38
 LWLock          | WALBufMappingLock    |   1301 |     23761 |   18.26
 LWLock          | CLogControlLock      |    514 |      3959 |    7.70
 LWLock          | buffer_mapping       |    162 |      1313 |    8.10
 LWLock          | XidGenLock           |    153 |      2399 |   15.68
 LWLock          | lock_manager         |    122 |      5051 |   41.40
 Lock            | tuple                |    105 |    114632 | 1091.73
 LWLock          | ProcArrayLock        |     99 |      1828 |   18.46
 LWLock          | CheckpointerCommLock |     33 |       854 |   25.88
 Lock            | extend               |     24 |     20493 |  853.88
 IO              | RelationMapRead      |     22 |        71 |    3.23
 LWLock          | buffer_io            |     11 |      7398 |  672.55
 IO              | SLRUWrite            |      4 |       249 |   62.25
 IO              | SLRUSync             |      4 |         0 |    0.00
 IPC             | ClogGroupUpdate      |      3 |       223 |   74.33
 LWLock          | SyncRepLock          |      1 |         1 |    1.00


We can see that WALWriteLock has the longest total duration(times), and ClientRead is the next one. We might get this
informationfrom the sampling. We can also see that WALWriteLock longest total duration is because of its high average
time,while ClientRead total duration is because of its many counts. We can also know that transactionid, tuple,
WALWriteLock,WALSync, extend have the longer average time than the others. If we want to improve the performance even
forjust a bit, such info is useful I think.
 

The patch which is proposed in this thread was not accepted because it loads overhead on the server/backend.
The function that measures the wait event time, gettimeofday and clock_gettime, takes time to execute and it is on the
frequentlycalled path(pgstat_report_wait_start/end), it is supposed a lot of overhead will be introduced.
 

Even if we only count the wait event, there would be overhead, which is pointed out in the another thread.[1]

The overhead which is induced by getting wait event info was discussed from old times, but I couldn't find the actual
measuringresults, so I want to measure its overhead.
 

[1] https://www.postgresql.org/message-id/CA%2BTgmobf1NJD%2B_DfQG5qccG5YFSnxk3CgC2mh0-UHabznCQtYA%40mail.gmail.com

--
Yoshikazu Imai 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Rearranging ALTER TABLE to avoid multi-operations bugs
Next
From: vignesh C
Date:
Subject: Re: Remove unused function argument