Thread: pg_stat_statements oddity with track = all

pg_stat_statements oddity with track = all

From
Julien Rouhaud
Date:
Hi,

Someone raised an interested point recently on pg_stat_kcache extension for
handling nested statements, which also applies to pg_stat_statements.

The root issue is that when pg_stat_statements tracks nested statements,
there's no way to really make sense of the counters, as top level statements
will also account for underlying statements.  Using a naive example:

=# CREATE FUNCTION f1() RETURNS VOID AS $$BEGIN PERFORM pg_sleep(5); END; $$ LANGUAGE plpgsql;
CREATE FUNCTION

=# SELECT pg_stat_statements_reset();
 pg_stat_statements_reset
--------------------------

(1 row)

=# SELECT f1();
 f1
----

(1 row)

=# select sum(total_exec_time) from pg_stat_statements;
     sum
--------------
 10004.403601
(1 row)

It looks like there was 10s total execution time overall all statements, which
doesn't really make sense.

It's of course possible to avoid that using track = top, but tracking all
nested statements is usually quite useful so it could be better to find a way
to better address that problem.

The only idea I have for that is to add a new field to entry key, for instance
is_toplevel.  The immediate cons is obviously that it could amplify quite a lot
the number of entries tracked, so people may need to increase
pg_stat_statements.max to avoid slowdown if that makes them reach frequent
entry eviction.

Should we address the problem, and in that case do you see a better way for
that, or should we just document this behavior?



Re: pg_stat_statements oddity with track = all

From
Nikolay Samokhvalov
Date:
On Tue, Dec 1, 2020 at 8:05 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,

Someone raised an interested point recently on pg_stat_kcache extension for
handling nested statements, which also applies to pg_stat_statements.
... 
The only idea I have for that is to add a new field to entry key, for instance
is_toplevel.

This particular problem often bothered me when dealing with pg_stat_statements contents operating under "track = all" (especially when performing the aggregated analysis, like you showed).

I think the idea of having a flag to distinguish the top-level entries is great.
 
The immediate cons is obviously that it could amplify quite a lot
the number of entries tracked, so people may need to increase
pg_stat_statements.max to avoid slowdown if that makes them reach frequent
entry eviction.

If all top-level records in pg_stat_statements have "true" in the new column (is_toplevel), how would this lead to the need to increase pg_stat_statements.max? The number of records would remain the same, as before extending pg_stat_statements.

Re: pg_stat_statements oddity with track = all

From
Julien Rouhaud
Date:
On Tue, Dec 01, 2020 at 10:08:06PM -0800, Nikolay Samokhvalov wrote:
> On Tue, Dec 1, 2020 at 8:05 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> 
> > Someone raised an interested point recently on pg_stat_kcache extension for
> > handling nested statements, which also applies to pg_stat_statements.
> >
> ...
> 
> > The only idea I have for that is to add a new field to entry key, for
> > instance
> > is_toplevel.
> 
> 
> This particular problem often bothered me when dealing with
> pg_stat_statements contents operating under "track = all" (especially when
> performing the aggregated analysis, like you showed).
> 
> I think the idea of having a flag to distinguish the top-level entries is
> great.
> 

Ok!

> > The immediate cons is obviously that it could amplify quite a lot
> > the number of entries tracked, so people may need to increase
> > pg_stat_statements.max to avoid slowdown if that makes them reach frequent
> > entry eviction.
> >
> 
> If all top-level records in pg_stat_statements have "true" in the new
> column (is_toplevel), how would this lead to the need to increase
> pg_stat_statements.max? The number of records would remain the same, as
> before extending pg_stat_statements.

If the same query is getting executed both at top level and as a nested
statement, two entries will then be created.  That's probably unlikely for
things like RI trigger queries, but I don't know what to expect for client
application queries.



Re: pg_stat_statements oddity with track = all

From
Fujii Masao
Date:

On 2020/12/02 15:32, Julien Rouhaud wrote:
> On Tue, Dec 01, 2020 at 10:08:06PM -0800, Nikolay Samokhvalov wrote:
>> On Tue, Dec 1, 2020 at 8:05 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>>
>>> Someone raised an interested point recently on pg_stat_kcache extension for
>>> handling nested statements, which also applies to pg_stat_statements.
>>>
>> ...
>>
>>> The only idea I have for that is to add a new field to entry key, for
>>> instance
>>> is_toplevel.
>>
>>
>> This particular problem often bothered me when dealing with
>> pg_stat_statements contents operating under "track = all" (especially when
>> performing the aggregated analysis, like you showed).
>>
>> I think the idea of having a flag to distinguish the top-level entries is
>> great.
>>
> 
> Ok!
> 
>>> The immediate cons is obviously that it could amplify quite a lot
>>> the number of entries tracked, so people may need to increase
>>> pg_stat_statements.max to avoid slowdown if that makes them reach frequent
>>> entry eviction.
>>>
>>
>> If all top-level records in pg_stat_statements have "true" in the new
>> column (is_toplevel), how would this lead to the need to increase
>> pg_stat_statements.max? The number of records would remain the same, as
>> before extending pg_stat_statements.
> 
> If the same query is getting executed both at top level and as a nested
> statement, two entries will then be created.  That's probably unlikely for
> things like RI trigger queries, but I don't know what to expect for client
> application queries.

Just idea; instead of boolean is_toplevel flag, what about
counting the number of times when the statement is executed
in toplevel, and also in nested level?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: pg_stat_statements oddity with track = all

From
Julien Rouhaud
Date:
On Wed, Dec 02, 2020 at 03:52:37PM +0900, Fujii Masao wrote:
> 
> On 2020/12/02 15:32, Julien Rouhaud wrote:
> > On Tue, Dec 01, 2020 at 10:08:06PM -0800, Nikolay Samokhvalov wrote:
> > > On Tue, Dec 1, 2020 at 8:05 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > 
> > > > Someone raised an interested point recently on pg_stat_kcache extension for
> > > > handling nested statements, which also applies to pg_stat_statements.
> > > > 
> > > ...
> > > 
> > > > The only idea I have for that is to add a new field to entry key, for
> > > > instance
> > > > is_toplevel.
> 
> [...]
> 
> Just idea; instead of boolean is_toplevel flag, what about
> counting the number of times when the statement is executed
> in toplevel, and also in nested level?

Ah, indeed that would avoid extraneous entries.  FTR we would also need that
for the planning part.

The cons I can see is that it'll make the counters harder to process (unless we
provide a specific view for the top-level statements only for instance), and
that it assumes that doing a simple division is representative enough for the
top level/nested repartition.  This might be quite off for in some cases, e.g.
big stored procedures due to lack of autovacuum, but that can't be worse than
what we currently have.



Re: pg_stat_statements oddity with track = all

From
legrand legrand
Date:
Hi,

a crazy idea:
- add a parent_statement_id column that would be NULL for top level queries
- build statement_id for nested queries based on the merge of:
   a/ current_statement_id and parent one
or
   b/ current_statement_id and nested level.

this would offer the ability to track counters at any depth level ;o)
Regards
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: pg_stat_statements oddity with track = all

From
Sergei Kornilov
Date:
Hello

> - add a parent_statement_id column that would be NULL for top level queries

Will generate too much entries... Every FK for each different delete/insert, for example.
But very useful for databases with a lot of stored procedures to find where this query is called. May be new mode track
=tree? Use NULL to indicate a top-level query (same as with track=tree) and some constant for any nested queries when
track= all.
 

Also, currently a top statement will account buffers usage for underlying statements?

regards, Sergei



Re: pg_stat_statements oddity with track = all

From
Nikolay Samokhvalov
Date:
On Tue, Dec 1, 2020 at 10:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Tue, Dec 01, 2020 at 10:08:06PM -0800, Nikolay Samokhvalov wrote:
> If all top-level records in pg_stat_statements have "true" in the new
> column (is_toplevel), how would this lead to the need to increase
> pg_stat_statements.max? The number of records would remain the same, as
> before extending pg_stat_statements.

If the same query is getting executed both at top level and as a nested
statement, two entries will then be created.  That's probably unlikely for
things like RI trigger queries, but I don't know what to expect for client
application queries.

Right, but this is how things already work. The extra field you've proposed won't increase the number of records so it shouldn't affect how users choose pg_stat_statements.max.

Re: pg_stat_statements oddity with track = all

From
Julien Rouhaud
Date:
On Wed, Dec 02, 2020 at 06:23:54AM -0800, Nikolay Samokhvalov wrote:
> On Tue, Dec 1, 2020 at 10:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> 
> > On Tue, Dec 01, 2020 at 10:08:06PM -0800, Nikolay Samokhvalov wrote:
> > > If all top-level records in pg_stat_statements have "true" in the new
> > > column (is_toplevel), how would this lead to the need to increase
> > > pg_stat_statements.max? The number of records would remain the same, as
> > > before extending pg_stat_statements.
> >
> > If the same query is getting executed both at top level and as a nested
> > statement, two entries will then be created.  That's probably unlikely for
> > things like RI trigger queries, but I don't know what to expect for client
> > application queries.
> >
> 
> Right, but this is how things already work. The extra field you've proposed
> won't increase the number of records so it shouldn't affect how users
> choose pg_stat_statements.max.

The extra field I've proposed would increase the number of records, as it needs
to be a part of the key.  The only alternative would be what Fufi-san
mentioned, i.e. to split plans and calls (for instance plans_toplevel,
plans_nested, calls_toplevel, calls_nested) and let users compute an
approximate value for toplevel counters.



Re: pg_stat_statements oddity with track = all

From
legrand legrand
Date:
Hi Julien,

> The extra field I've proposed would increase the number of records, as it
> needs
to be a part of the key. 

To get an increase in the number of records that means that the same
statement 
would appear at top level AND nested level. This seems a corner case with
very low 
(neglectible) occurence rate. Did I miss something ?

Regards
PAscal 



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: pg_stat_statements oddity with track = all

From
Sergei Kornilov
Date:
Hello

> To get an increase in the number of records that means that the same
> statement
> would appear at top level AND nested level. This seems a corner case with
> very low
> (neglectible) occurence rate.

+1
I think splitting fields into plans_toplevel / plans_nested will be less convenient. And more code with higher chance
ofcopypaste errors
 

regards, Sergei



Re: pg_stat_statements oddity with track = all

From
Julien Rouhaud
Date:
On Wed, Dec 02, 2020 at 05:13:56PM +0300, Sergei Kornilov wrote:
> Hello
> 
> > - add a parent_statement_id column that would be NULL for top level queries
> 
> Will generate too much entries... Every FK for each different delete/insert, for example.
> But very useful for databases with a lot of stored procedures to find where this query is called. May be new mode
track= tree? Use NULL to indicate a top-level query (same as with track=tree) and some constant for any nested queries
whentrack = all.
 

Maybe pg_stat_statements isn't the best tool for that use case.  For the record
the profiler in plpgsql_check can now track queryid for each statements inside
a function, so you match pg_stat_statements entries.  That's clearly not
perfect as dynamic queries could generate different queryid, but that's a
start.

> Also, currently a top statement will account buffers usage for underlying statements?

I think so.



Re: pg_stat_statements oddity with track = all

From
Julien Rouhaud
Date:
On Thu, Dec 03, 2020 at 11:40:22AM +0300, Sergei Kornilov wrote:
> Hello
> 
> > To get an increase in the number of records that means that the same
> > statement
> > would appear at top level AND nested level. This seems a corner case with
> > very low
> > (neglectible) occurence rate.
> 
> +1
> I think splitting fields into plans_toplevel / plans_nested will be less convenient. And more code with higher chance
ofcopypaste errors
 

As I mentioned in a previous message, I really have no idea if that would be a
corner case or not.  For instance with native partitioning, the odds to have
many different query executed both at top level and as a nested statement may
be quite higher.



Re: pg_stat_statements oddity with track = all

From
Julien Rouhaud
Date:
On Thu, Dec 03, 2020 at 04:53:59PM +0800, Julien Rouhaud wrote:
> On Thu, Dec 03, 2020 at 11:40:22AM +0300, Sergei Kornilov wrote:
> > Hello
> > 
> > > To get an increase in the number of records that means that the same
> > > statement
> > > would appear at top level AND nested level. This seems a corner case with
> > > very low
> > > (neglectible) occurence rate.
> > 
> > +1
> > I think splitting fields into plans_toplevel / plans_nested will be less convenient. And more code with higher
chanceof copypaste errors
 
> 
> As I mentioned in a previous message, I really have no idea if that would be a
> corner case or not.  For instance with native partitioning, the odds to have
> many different query executed both at top level and as a nested statement may
> be quite higher.

The consensus seems to be adding a new boolean toplevel flag in the entry key,
so PFA a patch implementing that.  Note that the key now has padding, so
memset() calls are required.

Attachment

Re: pg_stat_statements oddity with track = all

From
Sergei Kornilov
Date:
Hello

Seems we need also change PGSS_FILE_HEADER.

regards, Sergei



Re: pg_stat_statements oddity with track = all

From
Julien Rouhaud
Date:
On Fri, Dec 04, 2020 at 12:06:10PM +0300, Sergei Kornilov wrote:
> Hello
> 
> Seems we need also change PGSS_FILE_HEADER.

Indeed, thanks!  v2 attached.

Attachment

Re: pg_stat_statements oddity with track = all

From
Julien Rouhaud
Date:
On Fri, Dec 04, 2020 at 06:09:13PM +0800, Julien Rouhaud wrote:
> On Fri, Dec 04, 2020 at 12:06:10PM +0300, Sergei Kornilov wrote:
> > Hello
> > 
> > Seems we need also change PGSS_FILE_HEADER.
> 
> Indeed, thanks!  v2 attached.

There was a conflict on PGSS_FILE_HEADER since some recent commit, v3 attached.

Attachment

Re: pg_stat_statements oddity with track = all

From
Masahiro Ikeda
Date:
Hi,

Thanks for making the patch to add "toplevel" column in 
pg_stat_statements.
This is a review comment.

There hasn't been any discussion, at least that I've been able to find.
So, +1 to change the status to "Ready for Committer".


1. submission/feature review
=============================

This patch can be applied cleanly to the current master branch(ed4367).
I tested with `make check-world` and I checked there is no fail.

This patch has reasonable documents and tests.
A "toplevel" column of pg_stat_statements view is documented and
following tests are added.
- pg_stat_statements.track option : 'top' and 'all'
- query type: normal query and nested query(pl/pgsql)

I tested the "update" command can work.
postgres=# ALTER EXTENSION pg_stat_statements UPDATE TO '1.10';

Although the "toplevel" column of all queries which already stored is 
'false',
we have to decide the default. I think 'false' is ok.


2. usability review
====================

This patch solves the problem we can't get to know
which query is top-level if pg_stat_statements.track = 'all'.

This leads that we can analyze with aggregated queries.

There is some use-case.
For example, we can know the sum of total_exec_time of queries
even if nested queries are executed.

We can know how efficiently a database can use CPU resource for queries
using the sum of the total_exec_time, and the exec_user_time and
exec_system_time in pg_stat_kcache which is the extension gathering
os resources.

Although one concern is whether only top-level is enough or not,
I couldn't come up with any use-case to use nested level, so I think 
it's ok.


3. coding review
=================

Although I had two concerns, I think they are no problem.
So, this patch looks good to me.

Following were my concerns.

A. the risk of too many same queries is duplicate.

Since this patch adds a "top" member in the hash key,
it leads to store duplicated same query which "top" is false and true.

This concern is already discussed and I agreed to the consensus
it seems unlikely to have the same queries being executed both
at the top level and as nested statements.

B. add a argument of the pg_stat_statements_reset().

Now, pg_stat_statements supports a flexible reset feature.
> pg_stat_statements_reset(userid Oid, dbid Oid, queryid bigint)

Although I wondered whether we need to add a top-level flag to the 
arguments,
I couldn't come up with any use-case to reset only top-level queries or 
not top-level queries.


4. others
==========

These comments are not related to this patch.

A. about the topic of commitfests

Since I think this feature is for monitoring,
it's better to change the topic from "System Administration"
to "Monitoring & Control".


B. check logic whether a query is top-level or not in pg_stat_kcache

This patch uses the only exec_nested_level to check whether a query is 
top-level or not.
The reason is nested_level is less useful and I agree.

But, pg_stat_kcache uses plan_nested_level too.
Although the check result is the same, it's better to change it
corresponding to this patch after it's committed.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: pg_stat_statements oddity with track = all

From
Julien Rouhaud
Date:
Hi,

On Tue, Jan 19, 2021 at 4:55 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>
> Thanks for making the patch to add "toplevel" column in
> pg_stat_statements.
> This is a review comment.

Thanks a lot for the thorough review!

> I tested the "update" command can work.
> postgres=# ALTER EXTENSION pg_stat_statements UPDATE TO '1.10';
>
> Although the "toplevel" column of all queries which already stored is
> 'false',
> we have to decide the default. I think 'false' is ok.

Mmm, I'm not sure that I understand this result.  The "toplevel" value
is tracked by the C code loaded at startup, so it should have a
correct value even if you used to have the extension in a previous
version, it's just that you can't access the toplevel field before
doing the ALTER EXTENSION pg_stat_statements UPDATE.  There's also a
change in the magic number, so you can't use the stored stat file from
a version before this patch.

I also fail to reproduce this behavior.  I did the following:

- start postgres with pg_stat_statements v1.10 (so with this patch) in
shared_preload_libraries
- CREATE EXTENSION pg_stat_statements VERSION '1.9';
- execute a few queries
- ALTER EXTENSION pg_stat_statements UPDATE;
- SELECT * FROM pg_stat_statements reports the query with toplvel to TRUE

Can you share a way to reproduce your findings?

> 2. usability review
> ====================
> [...]
> Although one concern is whether only top-level is enough or not,
> I couldn't come up with any use-case to use nested level, so I think
> it's ok.

I agree, I don't see how tracking statistics per nesting level would
really help.  The only additional use case I see would tracking
triggers/FK query execution, but the nesting level won't help with
that.

> 3. coding review
> =================
> [...]
> B. add a argument of the pg_stat_statements_reset().
>
> Now, pg_stat_statements supports a flexible reset feature.
> > pg_stat_statements_reset(userid Oid, dbid Oid, queryid bigint)
>
> Although I wondered whether we need to add a top-level flag to the
> arguments,
> I couldn't come up with any use-case to reset only top-level queries or
> not top-level queries.

Ah, I didn't think of the reset function.  I also fail to see a
reasonable use case to provide a switch for the reset function.

> 4. others
> ==========
>
> These comments are not related to this patch.
>
> A. about the topic of commitfests
>
> Since I think this feature is for monitoring,
> it's better to change the topic from "System Administration"
> to "Monitoring & Control".

I agree, thanks for the change!

> B. check logic whether a query is top-level or not in pg_stat_kcache
>
> This patch uses the only exec_nested_level to check whether a query is
> top-level or not.
> The reason is nested_level is less useful and I agree.

Do you mean plan_nest_level is less useful?

> But, pg_stat_kcache uses plan_nested_level too.
> Although the check result is the same, it's better to change it
> corresponding to this patch after it's committed.

I agree, we should be consistent here.  I'll take care of the needed
changes  if and when this patch is commited!



Re: pg_stat_statements oddity with track = all

From
Masahiro Ikeda
Date:
On 2021-01-20 18:14, Julien Rouhaud wrote:
> On Tue, Jan 19, 2021 at 4:55 PM Masahiro Ikeda 
> <ikedamsh@oss.nttdata.com> wrote:
>> I tested the "update" command can work.
>> postgres=# ALTER EXTENSION pg_stat_statements UPDATE TO '1.10';
>> 
>> Although the "toplevel" column of all queries which already stored is
>> 'false',
>> we have to decide the default. I think 'false' is ok.
> 
> Mmm, I'm not sure that I understand this result.  The "toplevel" value
> is tracked by the C code loaded at startup, so it should have a
> correct value even if you used to have the extension in a previous
> version, it's just that you can't access the toplevel field before
> doing the ALTER EXTENSION pg_stat_statements UPDATE.  There's also a
> change in the magic number, so you can't use the stored stat file from
> a version before this patch.
> 
> I also fail to reproduce this behavior.  I did the following:
> 
> - start postgres with pg_stat_statements v1.10 (so with this patch) in
> shared_preload_libraries
> - CREATE EXTENSION pg_stat_statements VERSION '1.9';
> - execute a few queries
> - ALTER EXTENSION pg_stat_statements UPDATE;
> - SELECT * FROM pg_stat_statements reports the query with toplvel to 
> TRUE
> 
> Can you share a way to reproduce your findings?

Sorry for making you confused. I can't reproduce although I tried now.
I think my procedure was something wrong. So, please ignore this 
comment, sorry about that.


>> B. check logic whether a query is top-level or not in pg_stat_kcache
>> 
>> This patch uses the only exec_nested_level to check whether a query is
>> top-level or not.
>> The reason is nested_level is less useful and I agree.
> 
> Do you mean plan_nest_level is less useful?

I think so. Anyway, it's important to correspond core's implementation
because pg_stat_statements and pg_stat_kcache are used at the same time.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: pg_stat_statements oddity with track = all

From
Masahiko Sawada
Date:
On Wed, Jan 20, 2021 at 6:15 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Hi,
>
> On Tue, Jan 19, 2021 at 4:55 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
> >
> > Thanks for making the patch to add "toplevel" column in
> > pg_stat_statements.
> > This is a review comment.
>
> Thanks a lot for the thorough review!
>
> > I tested the "update" command can work.
> > postgres=# ALTER EXTENSION pg_stat_statements UPDATE TO '1.10';
> >
> > Although the "toplevel" column of all queries which already stored is
> > 'false',
> > we have to decide the default. I think 'false' is ok.
>
> Mmm, I'm not sure that I understand this result.  The "toplevel" value
> is tracked by the C code loaded at startup, so it should have a
> correct value even if you used to have the extension in a previous
> version, it's just that you can't access the toplevel field before
> doing the ALTER EXTENSION pg_stat_statements UPDATE.  There's also a
> change in the magic number, so you can't use the stored stat file from
> a version before this patch.
>
> I also fail to reproduce this behavior.  I did the following:
>
> - start postgres with pg_stat_statements v1.10 (so with this patch) in
> shared_preload_libraries
> - CREATE EXTENSION pg_stat_statements VERSION '1.9';
> - execute a few queries
> - ALTER EXTENSION pg_stat_statements UPDATE;
> - SELECT * FROM pg_stat_statements reports the query with toplvel to TRUE
>
> Can you share a way to reproduce your findings?
>
> > 2. usability review
> > ====================
> > [...]
> > Although one concern is whether only top-level is enough or not,
> > I couldn't come up with any use-case to use nested level, so I think
> > it's ok.
>
> I agree, I don't see how tracking statistics per nesting level would
> really help.  The only additional use case I see would tracking
> triggers/FK query execution, but the nesting level won't help with
> that.
>
> > 3. coding review
> > =================
> > [...]
> > B. add a argument of the pg_stat_statements_reset().
> >
> > Now, pg_stat_statements supports a flexible reset feature.
> > > pg_stat_statements_reset(userid Oid, dbid Oid, queryid bigint)
> >
> > Although I wondered whether we need to add a top-level flag to the
> > arguments,
> > I couldn't come up with any use-case to reset only top-level queries or
> > not top-level queries.
>
> Ah, I didn't think of the reset function.  I also fail to see a
> reasonable use case to provide a switch for the reset function.
>
> > 4. others
> > ==========
> >
> > These comments are not related to this patch.
> >
> > A. about the topic of commitfests
> >
> > Since I think this feature is for monitoring,
> > it's better to change the topic from "System Administration"
> > to "Monitoring & Control".
>
> I agree, thanks for the change!

I've changed the topic accordingly.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



Re: pg_stat_statements oddity with track = all

From
Magnus Hagander
Date:
On Tue, Mar 16, 2021 at 4:34 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, Mar 16, 2021 at 12:55:45PM +0100, Magnus Hagander wrote:
> > On Tue, Mar 9, 2021 at 3:39 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > I think that we might be able to handle that without a flag.  The only thing
> > > that would need to be done is when creating an entry, look for an existing
> > > entry with the opposite flag, and if there's simply use the same
> > > (query_offset, query_len) info.  This doesn't sound that expensive.
> >
> > That's basically what I was trying to say :)
>
> Oh ok sorry :)
>
> > > The real pain point will be that the garbage collection phase
> > > will become way more expensive as it will now have to somehow maintain that
> > > knowledge, which will require additional lookups for each entry.  I'm a bit
> > > concerned about that, especially with the current heuristic to schedule garbage
> > > collection.  For now, need_qc_qtext says that we have to do it if the extent is
> > > more than 512 (B) * pgss_max.  This probably doesn't work well for people using
> > > ORM as they tend to generate gigantic SQL queries.
> >
> > Right, the cost would be mostly on the GC  side. I've never done any
> > profiling to see how big of a thing that is in systems today -- have
> > you?
>
> I didn't, but I don't see how it could be anything but ridiculously impacting.
> it's basically preventing any query from being planned or executed on the whole
> instance the time needed to read the previous qtext file, and write all entries
> still needed.
>
> > > I don't that think that anyone really had a strong argument, mostly gut
> > > feeling.  Note that pg_stat_kcache already implemented that toplevel flags, so
> > > if people are using that extension in a recent version they might have some
> > > figures to show.  I'll ping some people that I know are using it.
> >
> > Great -- data always wins over gut feelings :)
>
> So I asked some friends that have latest pg_stat_kcache installed on some
> preproduction environment configured to track nested queries.  There isn't a
> high throughput but the activity should still be representative of the
> production queries.  There are a lot of applications plugged there, around 20
> databases and quite a lot of PL code.
>
> After a few days, here are the statistics:
>
> - total of ~ 9500 entries
> - ~ 900 entries for nested statements
> - ~ 35 entries existing for both top level and nested statements
>
> So the duplicates account for less than 4% of the nested statements, and less
> than 0.5% of the whole entries.
>
> I wish I had more reports, but if this one is representative enough then it
> seems that trying to avoid storing duplicated queries wouldn't be worth it.

I agree. If those numbers are indeed representable, it seems like
better to pay that overhead than to pay the overhead of trying to
de-dupe it.

Let's hope they are :)


Looking through ti again my feeling said the toplevel column should go
after the queryid and not before, but I'm not going to open up a
bikeshed over that.

I've added in a comment to cover that one that you removed (if you did
send an updated patch as you said, then I missed it -- sorry), and
applied the rest.

Thanks!

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: pg_stat_statements oddity with track = all

From
Julien Rouhaud
Date:
On Thu, Apr 08, 2021 at 10:30:53AM +0200, Magnus Hagander wrote:
> 
> I agree. If those numbers are indeed representable, it seems like
> better to pay that overhead than to pay the overhead of trying to
> de-dupe it.
> 
> Let's hope they are :)

:)

> Looking through ti again my feeling said the toplevel column should go
> after the queryid and not before, but I'm not going to open up a
> bikeshed over that.
> 
> I've added in a comment to cover that one that you removed (if you did
> send an updated patch as you said, then I missed it -- sorry), and
> applied the rest.

Oops, somehow I totally forgot to send the new patch, sorry :(

While looking at the patch, I unfortunately just realize that I unnecessarily
bumped the version to 1.10, as 1.9 was already new as of pg14.  Honestly I have
no idea why I used 1.10 at that time.  Version numbers are not a scarce
resource but maybe it would be better to keep 1.10 for a future major postgres
version?

If yes, PFA a patch to merge 1.10 in 1.9.

Attachment

Re: pg_stat_statements oddity with track = all

From
Magnus Hagander
Date:
On Thu, Apr 8, 2021 at 2:04 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Thu, Apr 08, 2021 at 10:30:53AM +0200, Magnus Hagander wrote:
> >
> > I agree. If those numbers are indeed representable, it seems like
> > better to pay that overhead than to pay the overhead of trying to
> > de-dupe it.
> >
> > Let's hope they are :)
>
> :)
>
> > Looking through ti again my feeling said the toplevel column should go
> > after the queryid and not before, but I'm not going to open up a
> > bikeshed over that.
> >
> > I've added in a comment to cover that one that you removed (if you did
> > send an updated patch as you said, then I missed it -- sorry), and
> > applied the rest.
>
> Oops, somehow I totally forgot to send the new patch, sorry :(
>
> While looking at the patch, I unfortunately just realize that I unnecessarily
> bumped the version to 1.10, as 1.9 was already new as of pg14.  Honestly I have
> no idea why I used 1.10 at that time.  Version numbers are not a scarce
> resource but maybe it would be better to keep 1.10 for a future major postgres
> version?
>
> If yes, PFA a patch to merge 1.10 in 1.9.

I actually thought I looked at that, but clearly I was confused one
way or another.

I think you're right, it's cleaner to merge it into 1.9, so applied and pushed.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: pg_stat_statements oddity with track = all

From
Julien Rouhaud
Date:
On Thu, Apr 08, 2021 at 03:18:09PM +0200, Magnus Hagander wrote:
> On Thu, Apr 8, 2021 at 2:04 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > If yes, PFA a patch to merge 1.10 in 1.9.
> 
> I actually thought I looked at that, but clearly I was confused one
> way or another.
> 
> I think you're right, it's cleaner to merge it into 1.9, so applied and pushed.

Thanks Magnus!