RE: [Proposal] Add accumulated statistics for wait event - Mailing list pgsql-hackers

From Phil Florent
Subject RE: [Proposal] Add accumulated statistics for wait event
Date
Msg-id DB6PR0301MB22786C585EA07FC09FEF7553BAF30@DB6PR0301MB2278.eurprd03.prod.outlook.com
Whole thread Raw
In response to RE: [Proposal] Add accumulated statistics for wait event  ("Yotsunaga, Naoki" <yotsunaga.naoki@jp.fujitsu.com>)
Responses RE: [Proposal] Add accumulated statistics for wait event  ("Yotsunaga, Naoki" <yotsunaga.naoki@jp.fujitsu.com>)
List pgsql-hackers
Hi,

Is DBA really able to solve bottlenecks with sampling?

What I would like to say is that if we have information on the number of wait events and the wait time(like other DB), we can investigate more easily.


Yes you will be able to solve bottlenecks with sampling. In interactive mode, a 1s interval is probably too large. I use 0s1 - 0s01 with my tool and it is normally OK. In batch mode I use 1s=>10s. If you want to visualize the results it's easy to use a dedicated tool and bottlenecks will clearly appear .
Since grafana is now able to connect directly to a postgresql source, I use it to display the information collected from pg_stat_activity and psutil ( e.g https://pgphil.ovh/traqueur_dashboard_02.php page is written in french but panels are in english)

Other DB have accumulated statistics but you can notice that sampling is also their most modern method.
E.g Oracle DB : 20 years ago you already had tools like "utlbstat/utlestat" . Then you had "statspack". Those tools were based on accumulated statistics and the reports were based on differences between 2 points. It was useful to solve major problems but it was limited and not precise enough in many cases.

The preferred feature to identify bottlenecks in the Oracle world is now ASH (active session history). It can help with major problems, specific problems AND it can identify short blockages.
Too bad it is licensed as an option of their Enterprise Edition but similar tools exist and they are also based on sampling of the activity.

With the "official" ASH, sampling and archiving are done internally and you have a circular memory zone dedicated to the feature. Hence the overhead is lower but that's all.

The most advanced interactive tool is called "snapper" and it is also based on sampling.

Best regards
Phil



De : Yotsunaga, Naoki <yotsunaga.naoki@jp.fujitsu.com>
Envoyé : lundi 29 octobre 2018 02:20
À : 'Phil Florent'; 'Michael Paquier'
Cc : 'Tomas Vondra'; 'pgsql-hackers@lists.postgresql.org'
Objet : RE: [Proposal] Add accumulated statistics for wait event
 

On Thu, Oct 4, 2018 at 8:22 PM, Yotsunaga Naoki wrote:

 

Hi, I understood and thought of your statistic comment once again. In the case of sampling, is there enough statistic to investigate?

In the case of a long SQL, I think that it is possible to obtain a sufficient sampling number.

 

However, in the case of about 1 minute of SQL, only 60 samples can be obtained at most.

#Because legards comment.

https://www.postgresql.org/message-id/1539158356795-0.post%40n3.nabble.com

 

Does this sampling number of 60 give information that I really want?

Perhaps it is not to miss the real problem part?

---------------------------------------

Naoki, Yotsunaga.

pgsql-hackers by date:

Previous
From: Andreas 'ads' Scherbaum
Date:
Subject: Re: INSTALL file
Next
From: MichaelDBA
Date:
Subject: pgadmin4 and scram