Thread: IDEA: pg_stat_statements tracking utility statements by tag?

IDEA: pg_stat_statements tracking utility statements by tag?

From

Martijn van Oosterhout

Date:

29 July 2020, 09:24:57

Hoi hackers,

We've been using the pg_stat_statements extension to get an idea of the queries used in the database, but the table is being filled with entries like:

SAVEPOINT sa_savepoint_NNN;

RELEASE SAVEPOINT sa_savepoint_NNN;

DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ...

FETCH FORWARD 250 FROM "c_7f9451b03908_5"

Since the unique id is different for each query, the aggregation does nothing and there are quite a lot of these drowning out the normal queries (yes, I'm aware this is an issue of itself). The only way to deal with this is "pg_stat_statements.track_utility=off". However, it occurs to me that if you just tracked the tags rather than the full query text you could at least track the number of such queries and how much time they take. So the above queries would be tracked under SAVEPOINT, RELEASE, DECLARE CURSOR and (I guess) FETCH respectively. But it would also catch DDL.

Does this sound like something for which a patch would be accepted?

Have a nice day,

Martijn van Oosterhout <kleptog@gmail.com> http://svana.org/kleptog/

Re: IDEA: pg_stat_statements tracking utility statements by tag?

From

Fujii Masao

Date:

29 July 2020, 12:42:05


On 2020/07/29 18:24, Martijn van Oosterhout wrote:
> Hoi hackers,
> 
> We've been using the pg_stat_statements extension to get an idea of the queries used in the database, but the table
isbeing filled with entries like:
 
> 
> SAVEPOINT sa_savepoint_NNN;
> RELEASE SAVEPOINT sa_savepoint_NNN;
> DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ...
> FETCH FORWARD 250 FROM "c_7f9451b03908_5"
> 
> Since the unique id is different for each query, the aggregation does nothing and there are quite a lot of these
drowningout the normal queries (yes, I'm aware this is an issue of itself). The only way to deal with this is
"pg_stat_statements.track_utility=off".However, it occurs to me that if you just tracked the tags rather than the full
querytext you could at least track the number of such queries and how much time they take. So the above queries would
betracked under SAVEPOINT, RELEASE, DECLARE CURSOR and (I guess) FETCH respectively. But it would also catch DDL.
 
> 
> Does this sound like something for which a patch would be accepted?

Or, we should extend the existing query normalization to handle also DDL?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: IDEA: pg_stat_statements tracking utility statements by tag?

From

Julien Rouhaud

Date:

29 July 2020, 13:40:16

On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> On 2020/07/29 18:24, Martijn van Oosterhout wrote:
> > Hoi hackers,
> >
> > We've been using the pg_stat_statements extension to get an idea of the queries used in the database, but the table
isbeing filled with entries like: 
> >
> > SAVEPOINT sa_savepoint_NNN;
> > RELEASE SAVEPOINT sa_savepoint_NNN;
> > DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ...
> > FETCH FORWARD 250 FROM "c_7f9451b03908_5"
> >
> > Since the unique id is different for each query, the aggregation does nothing and there are quite a lot of these
drowningout the normal queries (yes, I'm aware this is an issue of itself). The only way to deal with this is
"pg_stat_statements.track_utility=off".However, it occurs to me that if you just tracked the tags rather than the full
querytext you could at least track the number of such queries and how much time they take. So the above queries would
betracked under SAVEPOINT, RELEASE, DECLARE CURSOR and (I guess) FETCH respectively. But it would also catch DDL. 
> >
> > Does this sound like something for which a patch would be accepted?
>
> Or, we should extend the existing query normalization to handle also DDL?

+1, introducing DDL normalization seems like a better way to go in the
long run.  Defining what should and shouldn't be normalized can be
tricky though.

Re: IDEA: pg_stat_statements tracking utility statements by tag?

From

Martijn van Oosterhout

Date:

29 July 2020, 15:29:42

On Wed, 29 Jul 2020 at 15:40, Julien Rouhaud <rjuju123@gmail.com> wrote:

On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
> Or, we should extend the existing query normalization to handle also DDL?

+1, introducing DDL normalization seems like a better way to go in the
long run. Defining what should and shouldn't be normalized can be
tricky though.

In principle, the only thing that really needs to be normalised is SAVEPOINT/CURSOR names which are essentially random strings which have no effect on the result. Most other stuff is material to the query.

That said, I think "aggregate by tag" has value all by itself. Being able to collapse all CREATE TABLES into a single line can be useful in some situations.

Ideally the results of FETCH "cursor" should be combined with the DECLARE, but I really don't know how to go about that.

Have a nice day,

Martijn van Oosterhout <kleptog@gmail.com> http://svana.org/kleptog/

Re: IDEA: pg_stat_statements tracking utility statements by tag?

From

Julien Rouhaud

Date:

29 July 2020, 16:35:41

On Wed, Jul 29, 2020 at 5:29 PM Martijn van Oosterhout
<kleptog@gmail.com> wrote:
>
>
> On Wed, 29 Jul 2020 at 15:40, Julien Rouhaud <rjuju123@gmail.com> wrote:
>>
>> On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> >
>> >
>> > Or, we should extend the existing query normalization to handle also DDL?
>>
>> +1, introducing DDL normalization seems like a better way to go in the
>> long run.  Defining what should and shouldn't be normalized can be
>> tricky though.
>
>
> In principle, the only thing that really needs to be normalised is SAVEPOINT/CURSOR names which are essentially
randomstrings which have no effect on the result. Most other stuff is material to the query.
 
>
> That said, I think "aggregate by tag" has value all by itself. Being able to collapse all CREATE TABLES into a single
linecan be useful in some situations.
 

There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK
PREPARED that should be normalized too.  I also don't think that we
really want to have different entries for begin / Begin / BEGIN /
bEgin and similar for many other commands, as the hash is computed
based on the query text.

Re: IDEA: pg_stat_statements tracking utility statements by tag?

From

Michael Paquier

Date:

30 July 2020, 01:54:34

On Wed, Jul 29, 2020 at 06:35:41PM +0200, Julien Rouhaud wrote:
> There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK
> PREPARED that should be normalized too.  I also don't think that we
> really want to have different entries for begin / Begin / BEGIN /
> bEgin and similar for many other commands, as the hash is computed
> based on the query text.

Hmm.  Do we really want to those commands fully normalized all the
time?  There may be applications that care about the stats of some
commands that are for example prefixed the same way and would prefer
group those things together.  By fully normalizing those commands all
the time, we would lose this option.

An example.  The ODBC driver uses its own grammar for internal
savepoint names, aka _EXEC_SVP_%p.  If you mix that with a second
application that has its own naming policy for savepoints it would not
be possible anymore to make the difference in the stats between what
one or the other do.
--
Michael

Attachment

signature.asc

Re: IDEA: pg_stat_statements tracking utility statements by tag?

From

Julien Rouhaud

Date:

30 July 2020, 04:57:31

On Thu, Jul 30, 2020 at 3:54 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jul 29, 2020 at 06:35:41PM +0200, Julien Rouhaud wrote:
> > There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK
> > PREPARED that should be normalized too.  I also don't think that we
> > really want to have different entries for begin / Begin / BEGIN /
> > bEgin and similar for many other commands, as the hash is computed
> > based on the query text.
>
> Hmm.  Do we really want to those commands fully normalized all the
> time?  There may be applications that care about the stats of some
> commands that are for example prefixed the same way and would prefer
> group those things together.  By fully normalizing those commands all
> the time, we would lose this option.
>
> An example.  The ODBC driver uses its own grammar for internal
> savepoint names, aka _EXEC_SVP_%p.  If you mix that with a second
> application that has its own naming policy for savepoints it would not
> be possible anymore to make the difference in the stats between what
> one or the other do.

But if you have an OLTP application that uses ODBC, won't you already
have 80+% of pgss entries being savepoint orders, which is really not
helpful at all?  We'd technically lose the ability to group such
commands together, but in most cases the current behavior is quite
harmful.