Thread: IDEA: pg_stat_statements tracking utility statements by tag?

IDEA: pg_stat_statements tracking utility statements by tag?

From
Martijn van Oosterhout
Date:
Hoi hackers,

We've been using the pg_stat_statements extension to get an idea of the queries used in the database, but the table is being filled with entries like:

SAVEPOINT sa_savepoint_NNN;
RELEASE SAVEPOINT sa_savepoint_NNN;
DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ...
FETCH FORWARD 250 FROM "c_7f9451b03908_5"            

Since the unique id is different for each query, the aggregation does nothing and there are quite a lot of these drowning out the normal queries (yes, I'm aware this is an issue of itself). The only way to deal with this is "pg_stat_statements.track_utility=off". However, it occurs to me that if you just tracked the tags rather than the full query text you could at least track the number of such queries and how much time they take. So the above queries would be tracked under SAVEPOINT, RELEASE, DECLARE CURSOR and (I guess) FETCH respectively. But it would also catch DDL.

Does this sound like something for which a patch would be accepted?

Have a nice day,
--

Re: IDEA: pg_stat_statements tracking utility statements by tag?

From
Fujii Masao
Date:

On 2020/07/29 18:24, Martijn van Oosterhout wrote:
> Hoi hackers,
> 
> We've been using the pg_stat_statements extension to get an idea of the queries used in the database, but the table
isbeing filled with entries like:
 
> 
> SAVEPOINT sa_savepoint_NNN;
> RELEASE SAVEPOINT sa_savepoint_NNN;
> DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ...
> FETCH FORWARD 250 FROM "c_7f9451b03908_5"
> 
> Since the unique id is different for each query, the aggregation does nothing and there are quite a lot of these
drowningout the normal queries (yes, I'm aware this is an issue of itself). The only way to deal with this is
"pg_stat_statements.track_utility=off".However, it occurs to me that if you just tracked the tags rather than the full
querytext you could at least track the number of such queries and how much time they take. So the above queries would
betracked under SAVEPOINT, RELEASE, DECLARE CURSOR and (I guess) FETCH respectively. But it would also catch DDL.
 
> 
> Does this sound like something for which a patch would be accepted?

Or, we should extend the existing query normalization to handle also DDL?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: IDEA: pg_stat_statements tracking utility statements by tag?

From
Julien Rouhaud
Date:
On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> On 2020/07/29 18:24, Martijn van Oosterhout wrote:
> > Hoi hackers,
> >
> > We've been using the pg_stat_statements extension to get an idea of the queries used in the database, but the table
isbeing filled with entries like: 
> >
> > SAVEPOINT sa_savepoint_NNN;
> > RELEASE SAVEPOINT sa_savepoint_NNN;
> > DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ...
> > FETCH FORWARD 250 FROM "c_7f9451b03908_5"
> >
> > Since the unique id is different for each query, the aggregation does nothing and there are quite a lot of these
drowningout the normal queries (yes, I'm aware this is an issue of itself). The only way to deal with this is
"pg_stat_statements.track_utility=off".However, it occurs to me that if you just tracked the tags rather than the full
querytext you could at least track the number of such queries and how much time they take. So the above queries would
betracked under SAVEPOINT, RELEASE, DECLARE CURSOR and (I guess) FETCH respectively. But it would also catch DDL. 
> >
> > Does this sound like something for which a patch would be accepted?
>
> Or, we should extend the existing query normalization to handle also DDL?

+1, introducing DDL normalization seems like a better way to go in the
long run.  Defining what should and shouldn't be normalized can be
tricky though.



Re: IDEA: pg_stat_statements tracking utility statements by tag?

From
Martijn van Oosterhout
Date:

On Wed, 29 Jul 2020 at 15:40, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
> Or, we should extend the existing query normalization to handle also DDL?

+1, introducing DDL normalization seems like a better way to go in the
long run.  Defining what should and shouldn't be normalized can be
tricky though.

In principle, the only thing that really needs to be normalised is SAVEPOINT/CURSOR names which are essentially random strings which have no effect on the result. Most other stuff is material to the query.

That said, I think "aggregate by tag" has value all by itself. Being able to collapse all CREATE TABLES into a single line can be useful in some situations.

Ideally the results of FETCH "cursor" should be combined with the DECLARE, but I really don't know how to go about that.

Have a nice day,
--

Re: IDEA: pg_stat_statements tracking utility statements by tag?

From
Julien Rouhaud
Date:
On Wed, Jul 29, 2020 at 5:29 PM Martijn van Oosterhout
<kleptog@gmail.com> wrote:
>
>
> On Wed, 29 Jul 2020 at 15:40, Julien Rouhaud <rjuju123@gmail.com> wrote:
>>
>> On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> >
>> >
>> > Or, we should extend the existing query normalization to handle also DDL?
>>
>> +1, introducing DDL normalization seems like a better way to go in the
>> long run.  Defining what should and shouldn't be normalized can be
>> tricky though.
>
>
> In principle, the only thing that really needs to be normalised is SAVEPOINT/CURSOR names which are essentially
randomstrings which have no effect on the result. Most other stuff is material to the query.
 
>
> That said, I think "aggregate by tag" has value all by itself. Being able to collapse all CREATE TABLES into a single
linecan be useful in some situations.
 

There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK
PREPARED that should be normalized too.  I also don't think that we
really want to have different entries for begin / Begin / BEGIN /
bEgin and similar for many other commands, as the hash is computed
based on the query text.



Re: IDEA: pg_stat_statements tracking utility statements by tag?

From
Michael Paquier
Date:
On Wed, Jul 29, 2020 at 06:35:41PM +0200, Julien Rouhaud wrote:
> There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK
> PREPARED that should be normalized too.  I also don't think that we
> really want to have different entries for begin / Begin / BEGIN /
> bEgin and similar for many other commands, as the hash is computed
> based on the query text.

Hmm.  Do we really want to those commands fully normalized all the
time?  There may be applications that care about the stats of some
commands that are for example prefixed the same way and would prefer
group those things together.  By fully normalizing those commands all
the time, we would lose this option.

An example.  The ODBC driver uses its own grammar for internal
savepoint names, aka _EXEC_SVP_%p.  If you mix that with a second
application that has its own naming policy for savepoints it would not
be possible anymore to make the difference in the stats between what
one or the other do.
--
Michael

Attachment

Re: IDEA: pg_stat_statements tracking utility statements by tag?

From
Julien Rouhaud
Date:
On Thu, Jul 30, 2020 at 3:54 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jul 29, 2020 at 06:35:41PM +0200, Julien Rouhaud wrote:
> > There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK
> > PREPARED that should be normalized too.  I also don't think that we
> > really want to have different entries for begin / Begin / BEGIN /
> > bEgin and similar for many other commands, as the hash is computed
> > based on the query text.
>
> Hmm.  Do we really want to those commands fully normalized all the
> time?  There may be applications that care about the stats of some
> commands that are for example prefixed the same way and would prefer
> group those things together.  By fully normalizing those commands all
> the time, we would lose this option.
>
> An example.  The ODBC driver uses its own grammar for internal
> savepoint names, aka _EXEC_SVP_%p.  If you mix that with a second
> application that has its own naming policy for savepoints it would not
> be possible anymore to make the difference in the stats between what
> one or the other do.

But if you have an OLTP application that uses ODBC, won't you already
have 80+% of pgss entries being savepoint orders, which is really not
helpful at all?  We'd technically lose the ability to group such
commands together, but in most cases the current behavior is quite
harmful.