Re: pg_stat_statements oddity with track = all - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: pg_stat_statements oddity with track = all
Date
Msg-id 4dce029c72fc5eb084c5353f31daa13f@oss.nttdata.com
Whole thread Raw
In response to Re: pg_stat_statements oddity with track = all  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: pg_stat_statements oddity with track = all  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
Hi,

Thanks for making the patch to add "toplevel" column in 
pg_stat_statements.
This is a review comment.

There hasn't been any discussion, at least that I've been able to find.
So, +1 to change the status to "Ready for Committer".


1. submission/feature review
=============================

This patch can be applied cleanly to the current master branch(ed4367).
I tested with `make check-world` and I checked there is no fail.

This patch has reasonable documents and tests.
A "toplevel" column of pg_stat_statements view is documented and
following tests are added.
- pg_stat_statements.track option : 'top' and 'all'
- query type: normal query and nested query(pl/pgsql)

I tested the "update" command can work.
postgres=# ALTER EXTENSION pg_stat_statements UPDATE TO '1.10';

Although the "toplevel" column of all queries which already stored is 
'false',
we have to decide the default. I think 'false' is ok.


2. usability review
====================

This patch solves the problem we can't get to know
which query is top-level if pg_stat_statements.track = 'all'.

This leads that we can analyze with aggregated queries.

There is some use-case.
For example, we can know the sum of total_exec_time of queries
even if nested queries are executed.

We can know how efficiently a database can use CPU resource for queries
using the sum of the total_exec_time, and the exec_user_time and
exec_system_time in pg_stat_kcache which is the extension gathering
os resources.

Although one concern is whether only top-level is enough or not,
I couldn't come up with any use-case to use nested level, so I think 
it's ok.


3. coding review
=================

Although I had two concerns, I think they are no problem.
So, this patch looks good to me.

Following were my concerns.

A. the risk of too many same queries is duplicate.

Since this patch adds a "top" member in the hash key,
it leads to store duplicated same query which "top" is false and true.

This concern is already discussed and I agreed to the consensus
it seems unlikely to have the same queries being executed both
at the top level and as nested statements.

B. add a argument of the pg_stat_statements_reset().

Now, pg_stat_statements supports a flexible reset feature.
> pg_stat_statements_reset(userid Oid, dbid Oid, queryid bigint)

Although I wondered whether we need to add a top-level flag to the 
arguments,
I couldn't come up with any use-case to reset only top-level queries or 
not top-level queries.


4. others
==========

These comments are not related to this patch.

A. about the topic of commitfests

Since I think this feature is for monitoring,
it's better to change the topic from "System Administration"
to "Monitoring & Control".


B. check logic whether a query is top-level or not in pg_stat_kcache

This patch uses the only exec_nested_level to check whether a query is 
top-level or not.
The reason is nested_level is less useful and I agree.

But, pg_stat_kcache uses plan_nested_level too.
Although the check result is the same, it's better to change it
corresponding to this patch after it's committed.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Is it worth accepting multiple CRLs?
Next
From: Peter Smith
Date:
Subject: Re: Single transaction in the tablesync worker?