Home > mailing lists

Thread: Planning counters in pg_stat_statements (using pgss_store)

Planning counters in pg_stat_statements (using pgss_store)

From

legrand legrand

Date:

29 December 2018, 23:27:18

Starting from

https://www.postgresql.org/message-id/CAEepm%3D2vORBhWQZ1DJmKXmCVi%2B15Tgrv%2B9brHLanWU7XE_FWxQ%40mail.gmail.com

Here is a patch trying to implement what was proposed by Tom Lane:

"What we could/should do instead, I think, is have pgss_planner_hook

make its own pgss_store() call to log the planning time results

(which would mean we don't need the added PlannedStmt field at all).

That would increase the overhead of this feature, which might mean

that it'd be worth having a pg_stat_statements GUC to enable it.

But it'd be a whole lot cleaner."

Now:

pgss_post_parse_analyze, initialize pgss entry with sql text,

pgss_planner_hook, adds planning_time and counts,

pgss_ExecutorEnd, works unchanged.

but doesn't include any pg_stat_statements GUC to enable it yet.

note: I didn't catch the sentence "which would mean we don't need the added PlannedStmt field at all".

Regarding initial remark from Thomas Munro:

"I agree with the sentiment on the old thread that

{total,min,max,mean,stddev}_time now seem badly named, but adding

execution makes them so long... Thoughts?"

What would you think about:

- userid

- dbid

- queryid

- query

- plans

- plan_time

- {min,max,mean,stddev}_plan_time

- calls

- exec_time

- {min,max,mean,stddev}_exec_time

- total_time (being the sum of plan_time and exec_time)

- rows

- ...

Regards

PAscal

Attachment

pgss_add_planning_counters.diff

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Sergei Kornilov

Date:

12 February 2019, 13:51:57

Hello

Thank you for picking this up! Did you register patch in CF app? I did not found entry.

Currently we have pg_stat_statements 1.7 version and this patch does not apply... My fast and small view:

> -             errmsg("could not read file \"%s\": %m",
> +             errmsg("could not read pg_stat_statement file \"%s\": %m",

Not sure this is need for this patch. Usually refactoring and new features are different topics.

> +#define PG_STAT_STATEMENTS_COLS_V1_4    25

should not be actual version? I think version in names is relevant to extension version.

And this patch does not have documentation changes.

> "I agree with the sentiment on the old thread that
> {total,min,max,mean,stddev}_time now seem badly named, but adding
> execution makes them so long...  Thoughts?"
>
> What would you think about:
> - userid
> - dbid
> - queryid
> - query
> - plans
> - plan_time
> - {min,max,mean,stddev}_plan_time
> - calls
> - exec_time
> - {min,max,mean,stddev}_exec_time
> - total_time (being the sum of plan_time and exec_time)
> - rows
> - ...

We have some consensus about backward incompatible changes in this function? *plan_time + *exec_time naming is ok for
me

regards, Sergei

Re: Planning counters in pg_stat_statements (using pgss_store)

From

legrand legrand

Date:

12 February 2019, 19:50:42

Hi Sergei,

Thank you for this review !

>Did you register patch in CF app? I did not found entry. 
I think it is related to https://commitfest.postgresql.org/16/1373/
but I don't know how to link it with.

Yes, there are many things to improve, but before to go deeper, 
I would like to know if that way to do it (with 3 access to pgss hash)
has a chance to get consensus  ?

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Sergei Kornilov

Date:

13 February 2019, 07:53:44

Hi

> I think it is related to https://commitfest.postgresql.org/16/1373/
> but I don't know how to link it with.

You can just add new entry in open commitfest and then attach previous thread. This is usual way for pick up old
patches.For example, as i did here: https://commitfest.postgresql.org/20/1711/
 

> Yes, there are many things to improve, but before to go deeper,
> I would like to know if that way to do it (with 3 access to pgss hash)
> has a chance to get consensus ?

I can not say something here, i am not experienced contributor here.
Can you post some performance test results with slowdown comparison between master branch and proposed patch?

regards, Sergei

Re: Planning counters in pg_stat_statements (using pgss_store)

From

legrand legrand

Date:

14 February 2019, 21:21:39

Thank you Sergei for your comments,

> Did you register patch in CF app? I did not found entry.
created today: https://commitfest.postgresql.org/22/1999/

> Currently we have pg_stat_statements 1.7 version and this patch does not
> apply... 
will rebase and create a 1.8 version

> -             errmsg("could not read file \"%s\": %m",
> +             errmsg("could not read pg_stat_statement file \"%s\": %m",
this is a mistake, will fix

> +#define PG_STAT_STATEMENTS_COLS_V1_4    25
I thought it was needed when adding new columns, isn't it ?

> And this patch does not have documentation changes.
will fix

and will provide some kind of benchmark to compare with actual version.

Regards
PAscal 



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Sergei Kornilov

Date:

15 February 2019, 07:32:30

Hi

>>  +#define PG_STAT_STATEMENTS_COLS_V1_4 25
>
> I thought it was needed when adding new columns, isn't it ?

Yes, this is needed. I mean it should be PG_STAT_STATEMENTS_COLS_V1_8: because such change was made for 1.8
pg_stat_statementsversion. Same thing for other version-specific places.
 

regards, Sergei

Re: Re: Planning counters in pg_stat_statements (using pgss_store)

From

David Steele

Date:

20 March 2019, 10:43:03

Hi PAscal,

On 2/15/19 11:32 AM, Sergei Kornilov wrote:
> Hi
> 
>>>   +#define PG_STAT_STATEMENTS_COLS_V1_4 25
>>
>> I thought it was needed when adding new columns, isn't it ?
> 
> Yes, this is needed. I mean it should be PG_STAT_STATEMENTS_COLS_V1_8: because such change was made for 1.8
pg_stat_statementsversion. Same thing for other version-specific places.
 

This patch has been waiting for an update for over a month.  Do you know 
when you will have one ready?  Should we move the release target to PG13?

Regards,
-- 
-David
david@pgmasters.net

RE: Planning counters in pg_stat_statements (using pgss_store)

From

legrand legrand

Date:

22 March 2019, 22:46:36

Hi,

Here is a rebased and corrected version .

Columns naming has not been modified, I would propose to change it to:

- plans: ok

- planning_time --> plan_time

- calls: ok

- total_time --> exec_time

- {min,max,mean,stddev}_time: ok

- new total_time (being the sum of plan_time and exec_time)

Regards

PAscal

Attachment

pgss_add_planning_counters_v1.diff

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

23 March 2019, 11:48:27

On Fri, Mar 22, 2019 at 11:46 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:
>
> Here is a rebased and corrected version .

This patch has multiple trailing whitespace, indent and coding style
issues.  You should consider running pg_indent before submitting a
patch.  I attach the diff after running pgindent if you want more
details about the various issues.

- *     Track statement execution times across a whole database cluster.
+ *     Track statement planning and execution times across a whole cluster.

if we're changing this, we should also fix the fact that's it's not
tracking only the time but various resources?

+       /* calc differences of buffer counters. */
+       bufusage.shared_blks_hit =
+           pgBufferUsage.shared_blks_hit - bufusage_start.shared_blks_hit;
[...]

This is an exact duplication of pgss_ProcessUtility(), it's probably
better to create a macro or a function for that instead.

+               pgss_store("",
+                  parse->queryId,          /* signal that it's a
utility stmt */
+                  -1,

the comment makes no sense, and also you can't pass an empty query
string / unknown len.  There's no guarantee that the entry for the
given queryId won't have been evicted, and in this case you'll create
a new and unrelated entry.

@@ -832,13 +931,13 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
     * the normalized string would be the same as the query text anyway, so
     * there's no need for an early entry.
     */
-   if (jstate.clocations_count > 0)
        pgss_store(pstate->p_sourcetext,

Why did you remove this?  pgss_store() isn't free, and asking to
generate a normalized query for a query that doesn't have any constant
or storing the entry early won't do anything useful AFAICT.  Though if
that's useful, you definitely can't remove the test without adapting
the comment and the indentation.

@@ -1249,15 +1351,19 @@ pgss_store(const char *query, uint64 queryId,
        if (e->counters.calls == 0)
            e->counters.usage = USAGE_INIT;

-       e->counters.calls += 1;
-       e->counters.total_time += total_time;
-       if (e->counters.calls == 1)
+       if (planning_time == 0)
+       {
+           e->counters.calls += 1;
+           e->counters.total_time += total_time;
+       }
+
+       if (e->counters.calls == 1 && planning_time == 0)
        {
            e->counters.min_time = total_time;
            e->counters.max_time = total_time;
            e->counters.mean_time = total_time;
        }
-       else
+       else if(planning_time == 0)
        {
            /*
             * Welford's method for accurately computing variance. See
@@ -1276,6 +1382,9 @@ pgss_store(const char *query, uint64 queryId,
            if (e->counters.max_time < total_time)
                e->counters.max_time = total_time;
        }
+       if (planning_time > 0)
+           e->counters.plans += 1;
+       e->counters.planning_time += planning_time;

there are 4 tests to check if planning_time is zero or not, it's quite
messy.  Could you refactor the code to avoid so many tests?  It would
probably be useful to add some asserts to check that we don't provide
both planning_time == 0 and execution related values.  The function's
comment would also need to be adapted to mention the new rationale
with planning_time.

     * hash table entry for the PREPARE (with hash calculated from the query
     * string), and then a different one with the same query string (but hash
     * calculated from the query tree) would be used to accumulate costs of
-    * ensuing EXECUTEs.  This would be confusing, and inconsistent with other
-    * cases where planning time is not included at all.
+    * ensuing EXECUTEs.

the comment about confusing behavior is still valid.

>
> Columns naming has not been modified, I would propose to change it to:
>  - plans: ok
>  - planning_time --> plan_time
>  - calls: ok
>  - total_time --> exec_time
>  - {min,max,mean,stddev}_time: ok
>  - new total_time (being the sum of plan_time and exec_time)

plan_time and exec_time are accumulated counters, so we need to keep
the total_ prefix in any case.  I think it's ok to break the function
output names if we keep some kind of compatibility at the view level
(which can keep total_time as the sum of total_plan_time and
total_exec_time), so current queries against the view wouldn't break,
and get what they probably wanted.

Attachment

pgss_planning_pgindent.diff

RE: Planning counters in pg_stat_statements (using pgss_store)

From

legrand legrand

Date:

23 March 2019, 22:08:05

> This patch has multiple trailing whitespace, indent and coding style

> issues. You should consider running pg_indent before submitting a

> patch. I attach the diff after running pgindent if you want more

> details about the various issues.

fixed

> - * Track statement execution times across a whole database cluster.

> + * Track statement planning and execution times across a whole cluster.

> if we're changing this, we should also fix the fact that's it's not

> tracking only the time but various resources?

fixed

> + /* calc differences of buffer counters. */

> + bufusage.shared_blks_hit =

> + pgBufferUsage.shared_blks_hit - bufusage_start.shared_blks_hit;> >

> [...]

> This is an exact duplication of pgss_ProcessUtility(), it's probably

> better to create a macro or a function for that instead.

yes, maybe later (I don't know macros)

> + pgss_store("",

> + parse->queryId, /* signal that it's a

> utility stmt */

> + -1,

> the comment makes no sense, and also you can't pass an empty query

> string / unknown len. There's no guarantee that the entry for the

> given queryId won't have been evicted, and in this case you'll create

> a new and unrelated entry.

Fixed, comment was wrong

Query text is not available in pgss_planner_hook

that's why pgss_store execution is forced in pgss_post_parse_analyze

(to initialize pgss entry with its query text).

There is a very small risk that query has been evicted between

pgss_post_parse_analyze and pgss_planner_hook.

> @@ -832,13 +931,13 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)

> * the normalized string would be the same as the query text anyway, so

> * there's no need for an early entry.

> */

> - if (jstate.clocations_count > 0)

> pgss_store(pstate->p_sourcetext,

> Why did you remove this? pgss_store() isn't free, and asking to

> generate a normalized query for a query that doesn't have any constant

> or storing the entry early won't do anything useful AFAICT. Though if

> that's useful, you definitely can't remove the test without adapting

> the comment and the indentation.

See explanation in previous answer (comments have been updated accordingly)

> there are 4 tests to check if planning_time is zero or not, it's quite

> messy. Could you refactor the code to avoid so many tests? It would

> probably be useful to add some asserts to check that we don't provide

> both planning_time == 0 and execution related values. The function's

> comment would also need to be adapted to mention the new rationale

> with planning_time.

Fixed

> * hash table entry for the PREPARE (with hash calculated from the query

> * string), and then a different one with the same query string (but hash

> * calculated from the query tree) would be used to accumulate costs of

> - * ensuing EXECUTEs. This would be confusing, and inconsistent with other

> - * cases where planning time is not included at all.

> + * ensuing EXECUTEs.

> the comment about confusing behavior is still valid.

Fixed

>> Columns naming has not been modified, I would propose to change it to:

>> - plans: ok

>> - planning_time --> plan_time

>> - calls: ok

>> - total_time --> exec_time

>> - {min,max,mean,stddev}_time: ok

>> - new total_time (being the sum of plan_time and exec_time)

> plan_time and exec_time are accumulated counters, so we need to keep

> the total_ prefix in any case. I think it's ok to break the function

> output names if we keep some kind of compatibility at the view level

> (which can keep total_time as the sum of total_plan_time and

> total_exec_time), so current queries against the view wouldn't break,

> and get what they probably wanted.

before to change this at all (view, function, code, doc) levels,

I would like to be sure that column names will be:

- plans

- total_plan_time

- calls

- total_exec_time

- min_time (without exec in name)

- max_time (without exec in name)

- mean_time (without exec in name)

- stddev_time (without exec in name)

- total_time (being the sum of total_plan_time and total_exec_time)

could other users confirm ?

Attachment

pgss_add_planning_counters_v2.diff

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

24 March 2019, 10:24:50

On Sat, Mar 23, 2019 at 11:08 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:
>
> > This patch has multiple trailing whitespace, indent and coding style
> > issues.  You should consider running pg_indent before submitting a
> > patch.  I attach the diff after running pgindent if you want more
> > details about the various issues.
>
> fixed

There are still trailing whitespaces and comments wider than 80
characters in the C code that should be fixed.

> > +               pgss_store("",
> > +                  parse->queryId,          /* signal that it's a
> > utility stmt */
> > +                  -1,
>
> > the comment makes no sense, and also you can't pass an empty query
> > string / unknown len.  There's no guarantee that the entry for the
> > given queryId won't have been evicted, and in this case you'll create
> > a new and unrelated entry.
>
> Fixed, comment was wrong
> Query text is not available in pgss_planner_hook
> that's why pgss_store execution is forced in pgss_post_parse_analyze
> (to initialize pgss entry with its query text).
>
> There is a very small risk that query has been evicted between
> pgss_post_parse_analyze and pgss_planner_hook.
>
> > @@ -832,13 +931,13 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
> >      * the normalized string would be the same as the query text anyway, so
> >      * there's no need for an early entry.
> >      */
> > -   if (jstate.clocations_count > 0)
> >         pgss_store(pstate->p_sourcetext,
>
> > Why did you remove this?  pgss_store() isn't free, and asking to
> > generate a normalized query for a query that doesn't have any constant
> > or storing the entry early won't do anything useful AFAICT.  Though if
> > that's useful, you definitely can't remove the test without adapting
> > the comment and the indentation.
>
> See explanation in previous answer (comments have been updated accordingly)

The alternative being to expose query text to the planner, which could
fix this (unlikely) issue and could also sometimes save a pgss_store()
call.  I did a quick check and at least AQO and pg_hint_plan
extensions have some hacks to be able to access the query text from
the planner, so there are at least multiple needs for that.  Perhaps
it's time to do it?

> > there are 4 tests to check if planning_time is zero or not, it's quite
> > messy.  Could you refactor the code to avoid so many tests?  It would
> > probably be useful to add some asserts to check that we don't provide
> > both planning_time == 0 and execution related values.  The function's
> > comment would also need to be adapted to mention the new rationale
> > with planning_time.
>
> Fixed

+       /* updating counters for execute OR planning */
+       Assert(planning_time > 0 && total_time > 0);
+       if (planning_time == 0)

This is obviously incorrect.  The general sanity check for exclusion
between planning_time and total_time should be at the beginning of
pgss_store.  Maybe some others asserts are needed to verify that
planning_time  cannot be provided along jstate or other conditions.

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

24 March 2019, 12:10:59

On Sun, Mar 24, 2019 at 11:24 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > there are 4 tests to check if planning_time is zero or not, it's quite
> > > messy.  Could you refactor the code to avoid so many tests?  It would
> > > probably be useful to add some asserts to check that we don't provide
> > > both planning_time == 0 and execution related values.  The function's
> > > comment would also need to be adapted to mention the new rationale
> > > with planning_time.
> >
> > Fixed
>
> +       /* updating counters for execute OR planning */
> +       Assert(planning_time > 0 && total_time > 0);
> +       if (planning_time == 0)
>
> This is obviously incorrect.  The general sanity check for exclusion
> between planning_time and total_time should be at the beginning of
> pgss_store.  Maybe some others asserts are needed to verify that
> planning_time  cannot be provided along jstate or other conditions.

Actually, since pgss_store is now called to either:

- explicitly store a query text
- accumulate planning duration
- accumulate execution duration

and they're all mutually exclusive, It's probably better to change
pgss_store to pass an enum to describe what the call is for , and keep
a single time parameter.  It should make the code simpler.

Re: Planning counters in pg_stat_statements (using pgss_store)

From

legrand legrand

Date:

25 March 2019, 20:30:45

As there are now 3 locking times on pgss hash struct, one day or an other, 
somebody will ask for a GUC to disable this feature (to be able to run pgss
unchanged with only one lock as today).

With this GUC, pgss_store should be able to store the query text and
accumulated 
execution duration in the same call (as today).

Will try to provide this soon.




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Planning counters in pg_stat_statements (using pgss_store)

From

legrand legrand

Date:

27 March 2019, 20:36:08

>>     - trailing whitespaces and comments wider than 80 characters
>>          not fixed

> why?  In case it's not clear, I'm talking about the .c file, not the
> regression tests.

I work on a poor msys install on windows 7, where perl is broken ;o(
So no pgindent available. 
Will fix that later, or as soon as I get a pgindent diff.

>>     - "Assert(planning_time > 0 && total_time > 0);"
>>          moved at the beginning of pgss_store

> Have you tried to actually compile postgres and pg_stat_statements
> with --enable-cassert?  This test can *never* be true, since you
> either provide the planning time or the execution time or neither.  As
> I said in my previous mail, adding a parameter to say which counter
> you're updating, instead of adding another counter that's mutually
> exclusive with the other would make everything clearer.

Yes this "assert" is useless as is ... I'll remove it.
I understand you proposal of pgss_store refactoring, but I don't have 
much time available now ... and I would like to check that performances 
are not broken before any other modification ...

Regards
PAscal








--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

27 March 2019, 21:22:18

On Wed, Mar 27, 2019 at 9:36 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:
>
> >>     - trailing whitespaces and comments wider than 80 characters
> >>          not fixed
>
> > why?  In case it's not clear, I'm talking about the .c file, not the
> > regression tests.
>
> I work on a poor msys install on windows 7, where perl is broken ;o(
> So no pgindent available.
> Will fix that later, or as soon as I get a pgindent diff.
>
> >>     - "Assert(planning_time > 0 && total_time > 0);"
> >>          moved at the beginning of pgss_store
>
> > Have you tried to actually compile postgres and pg_stat_statements
> > with --enable-cassert?  This test can *never* be true, since you
> > either provide the planning time or the execution time or neither.  As
> > I said in my previous mail, adding a parameter to say which counter
> > you're updating, instead of adding another counter that's mutually
> > exclusive with the other would make everything clearer.
>
> Yes this "assert" is useless as is ... I'll remove it.
> I understand you proposal of pgss_store refactoring, but I don't have
> much time available now ... and I would like to check that performances
> are not broken before any other modification ...

Ok, but keep in mind that this is the last commitfest for pg12, and
there are only 4 days left.  Will you have time to take care of it, or
do you need help on it?

Re: Planning counters in pg_stat_statements (using pgss_store)

From

legrand legrand

Date:

27 March 2019, 22:39:56

Julien Rouhaud wrote
> On Wed, Mar 27, 2019 at 9:36 PM legrand legrand
> <

> legrand_legrand@

> > wrote:
>>
>> >>     - trailing whitespaces and comments wider than 80 characters
>> >>          not fixed
>>
>> > why?  In case it's not clear, I'm talking about the .c file, not the
>> > regression tests.
>>
>> I work on a poor msys install on windows 7, where perl is broken ;o(
>> So no pgindent available.
>> Will fix that later, or as soon as I get a pgindent diff.
>>
>> >>     - "Assert(planning_time > 0 && total_time > 0);"
>> >>          moved at the beginning of pgss_store
>>
>> > Have you tried to actually compile postgres and pg_stat_statements
>> > with --enable-cassert?  This test can *never* be true, since you
>> > either provide the planning time or the execution time or neither.  As
>> > I said in my previous mail, adding a parameter to say which counter
>> > you're updating, instead of adding another counter that's mutually
>> > exclusive with the other would make everything clearer.
>>
>> Yes this "assert" is useless as is ... I'll remove it.
>> I understand you proposal of pgss_store refactoring, but I don't have
>> much time available now ... and I would like to check that performances
>> are not broken before any other modification ...
> 
> Ok, but keep in mind that this is the last commitfest for pg12, and
> there are only 4 days left.  Will you have time to take care of it, or
> do you need help on it?

Oups, sorry, I won't have time nor knowledge to finish in time ;o(
Any help is welcome !

Regards
PAscal





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Sergei Kornilov

Date:

28 March 2019, 07:45:08

Hi

>>  Ok, but keep in mind that this is the last commitfest for pg12, and
>>  there are only 4 days left. Will you have time to take care of it, or
>>  do you need help on it?
>
> Oups, sorry, I won't have time nor knowledge to finish in time ;o(
> Any help is welcome !

No need to rush, this patch has is unlikely to get committed in pg12 even a month earlier. We have a general policy
thatwe don't like complex patches that first show up for the last commitfest of a dev cycle. Current commitfest is last
onebefore feature freeze.
 

I want such feature and will help with review in pg13 cycle.

regards, Sergei

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

28 March 2019, 08:48:41

On Thu, Mar 28, 2019 at 8:45 AM Sergei Kornilov <sk@zsrv.org> wrote:
>
> >>  Ok, but keep in mind that this is the last commitfest for pg12, and
> >>  there are only 4 days left. Will you have time to take care of it, or
> >>  do you need help on it?
> >
> > Oups, sorry, I won't have time nor knowledge to finish in time ;o(
> > Any help is welcome !
>
> No need to rush, this patch has is unlikely to get committed in pg12 even a month earlier. We have a general policy
thatwe don't like complex patches that first show up for the last commitfest of a dev cycle. Current commitfest is last
onebefore feature freeze. 

yes, but this patch first showed up years ago:
https://commitfest.postgresql.org/16/1373/.  Since nothing happened
since, it would be nice to have feedback on whether deeper changes on
the planning functions are required (so for pg13), or if current
approach is ok (and then I hope it'd be acceptable for pg12).

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

28 March 2019, 13:30:32

On Thu, Mar 28, 2019 at 9:48 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Thu, Mar 28, 2019 at 8:45 AM Sergei Kornilov <sk@zsrv.org> wrote:
> >
> > >>  Ok, but keep in mind that this is the last commitfest for pg12, and
> > >>  there are only 4 days left. Will you have time to take care of it, or
> > >>  do you need help on it?
> > >
> > > Oups, sorry, I won't have time nor knowledge to finish in time ;o(
> > > Any help is welcome !
> >
> > No need to rush, this patch has is unlikely to get committed in pg12 even a month earlier. We have a general policy
thatwe don't like complex patches that first show up for the last commitfest of a dev cycle. Current commitfest is last
onebefore feature freeze. 
>
> yes, but this patch first showed up years ago:
> https://commitfest.postgresql.org/16/1373/.  Since nothing happened
> since, it would be nice to have feedback on whether deeper changes on
> the planning functions are required (so for pg13), or if current
> approach is ok (and then I hope it'd be acceptable for pg12).

If that's helpful I attach the updated patches.  I split in two
commits, so if the query_text passing is not wanted it's quite easy to
ignore this part.

On Tue, Apr 2, 2019 at 9:22 AM legrand legrand
<legrand_legrand@hotmail.com> wrote:
>
> >> case  avg_tps   pct_diff
> >> 0        89 278   --
> >> 1        88 745   0,6%
> >> 2        88 282   1,1%
> >> 3        86 660   2,9%
> >>
> >> This means that even in this extrem test case, the worst degradation is less
> >> than 3%
> >> (this overhead can be removed using pg_stat_statements.track_planning guc)
>
> > Is the difference between 2 and 3 the extraneous pgss_store call to
> > always store the query text if planner hook doesn't have access to the
> > query text?
>
> Yes it is,
> but I agree it seems a big gap (1,8%) compared to the difference between 1 and 2 (0,5%).
> Maybe this is just mesure "noise" ...

Rebased patches attached.

On Fri, Sep 06, 2019 at 04:19:16PM +0200, Tomas Vondra wrote:
>
>FWIW I've done some benchmarking on this too, with a single pgbench client
>running select-only test on a tiny database, in different modes (simple,
>extended, prepared). I've done that on two systems with different CPUs
>(spreadsheet with results attached).
>

And of course, I forgot to attach the spreadsheet, so here it is.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

planning.ods

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Imai Yoshikazu

Date:

08 September 2019, 09:45:27

Hello

On 2019/09/06 23:19, Tomas Vondra wrote:
> On Wed, Sep 04, 2019 at 07:19:47PM +0300, Sergei Kornilov wrote:
>>
>> ...
>>
>> Results:
>>
>>         test         |   mode   | average_tps | degradation_perc
>> ----------------------+----------+-------------+------------------
>> head_no_pgss         | extended |       13816 |            1.000
>> patch_not_loaded     | extended |       13755 |            0.996
>> head_track_none      | extended |       13607 |            0.985
>> patch_track_none     | extended |       13560 |            0.981
>> head_track_top       | extended |       13277 |            0.961
>> patch_track_top      | extended |       13189 |            0.955
>> patch_track_planning | extended |       12983 |            0.940
>> head_no_pgss         | prepared |       29101 |            1.000
>> head_track_none      | prepared |       28510 |            0.980
>> patch_track_none     | prepared |       28481 |            0.979
>> patch_not_loaded     | prepared |       28382 |            0.975
>> patch_track_planning | prepared |       28046 |            0.964
>> head_track_top       | prepared |       28035 |            0.963
>> patch_track_top      | prepared |       27973 |            0.961
>> head_no_pgss         | simple   |       16733 |            1.000
>> patch_not_loaded     | simple   |       16552 |            0.989
>> head_track_none      | simple   |       16452 |            0.983
>> patch_track_none     | simple   |       16365 |            0.978
>> head_track_top       | simple   |       15867 |            0.948
>> patch_track_top      | simple   |       15820 |            0.945
>> patch_track_planning | simple   |       15739 |            0.941
>>
>> So I found slight slowdown with track_planning = off compared to HEAD. 
>> Possibly just at the level of measurement error. I think this is ok.
>> track_planning = on also has no dramatic impact. In my opinion 
>> proposed design with pgss_store call is acceptable.
>>
> 
> FWIW I've done some benchmarking on this too, with a single pgbench client
> running select-only test on a tiny database, in different modes (simple,
> extended, prepared). I've done that on two systems with different CPUs
> (spreadsheet with results attached).

Refering to Sergei's results, if a user, currently using pgss with 
tracking execute time, uses the new feature, a user will see 0~2.2% 
performance regression as below.

 >> head_track_top       | extended |       13277 |            0.961
 >> patch_track_planning | extended |       12983 |            0.940
 >> patch_track_planning | prepared |       28046 |            0.964
 >> head_track_top       | prepared |       28035 |            0.963
 >> head_track_top       | simple   |       15867 |            0.948
 >> patch_track_planning | simple   |       15739 |            0.941

If a user will not turn on the track_planning, a user will see 0.2-0.6% 
performance regression as below.

 >> head_track_top       | extended |       13277 |            0.961
 >> patch_track_top      | extended |       13189 |            0.955
 >> head_track_top       | prepared |       28035 |            0.963
 >> patch_track_top      | prepared |       27973 |            0.961
 >> head_track_top       | simple   |       15867 |            0.948
 >> patch_track_top      | simple   |       15820 |            0.945

> 
> I don't see any performance regression - there are some small variations
> in both directions (say, ~1%) but that's well within the noise. So I think
> the patch is fine in this regard.

+1


I also saw the codes and have one comment.

[0002 patch]
In pgss_planner_hook:

+        /* calc differences of buffer counters. */
+        bufusage = compute_buffer_counters(bufusage_start, pgBufferUsage);
+
+        /*
+         * we only store planning duration, query text has been initialized
+         * during previous pgss_post_parse_analyze as it not available inside
+         * pgss_planner_hook.
+         */
+        pgss_store(query_text,

Do we need to calculate bufusage in here?
We only store planning duration in the following pgss_store.

--
Yoshikazu Imai

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

10 September 2019, 22:30:25

On Fri, Sep 6, 2019 at 4:19 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Wed, Sep 04, 2019 at 07:19:47PM +0300, Sergei Kornilov wrote:
> >
> > ...
> >
> >Results:
> >
> >         test         |   mode   | average_tps | degradation_perc
> >----------------------+----------+-------------+------------------
> > head_no_pgss         | extended |       13816 |            1.000
> > patch_not_loaded     | extended |       13755 |            0.996
> > head_track_none      | extended |       13607 |            0.985
> > patch_track_none     | extended |       13560 |            0.981
> > head_track_top       | extended |       13277 |            0.961
> > patch_track_top      | extended |       13189 |            0.955
> > patch_track_planning | extended |       12983 |            0.940
> > head_no_pgss         | prepared |       29101 |            1.000
> > head_track_none      | prepared |       28510 |            0.980
> > patch_track_none     | prepared |       28481 |            0.979
> > patch_not_loaded     | prepared |       28382 |            0.975
> > patch_track_planning | prepared |       28046 |            0.964
> > head_track_top       | prepared |       28035 |            0.963
> > patch_track_top      | prepared |       27973 |            0.961
> > head_no_pgss         | simple   |       16733 |            1.000
> > patch_not_loaded     | simple   |       16552 |            0.989
> > head_track_none      | simple   |       16452 |            0.983
> > patch_track_none     | simple   |       16365 |            0.978
> > head_track_top       | simple   |       15867 |            0.948
> > patch_track_top      | simple   |       15820 |            0.945
> > patch_track_planning | simple   |       15739 |            0.941
> >
> >So I found slight slowdown with track_planning = off compared to HEAD. Possibly just at the level of measurement
error.I think this is ok.
 
> >track_planning = on also has no dramatic impact. In my opinion proposed design with pgss_store call is acceptable.
> >
>
> FWIW I've done some benchmarking on this too, with a single pgbench client
> running select-only test on a tiny database, in different modes (simple,
> extended, prepared). I've done that on two systems with different CPUs
> (spreadsheet with results attached).
>
> I don't see any performance regression - there are some small variations
> in both directions (say, ~1%) but that's well within the noise. So I think
> the patch is fine in this regard.

Thanks a lot Sergei and Tomas!  It's good to know that this patch
doesn't add significant overhead.

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

10 September 2019, 23:27:06

Hello,

On Sun, Sep 8, 2019 at 11:45 AM Imai Yoshikazu <yoshikazu_i443@live.jp> wrote:
>
> I also saw the codes and have one comment.

Thanks for looking at this patch!

> [0002 patch]
> In pgss_planner_hook:
>
> +               /* calc differences of buffer counters. */
> +               bufusage = compute_buffer_counters(bufusage_start, pgBufferUsage);
> +
> +               /*
> +                * we only store planning duration, query text has been initialized
> +                * during previous pgss_post_parse_analyze as it not available inside
> +                * pgss_planner_hook.
> +                */
> +               pgss_store(query_text,
>
> Do we need to calculate bufusage in here?
> We only store planning duration in the following pgss_store.

Good point!  Postgres can definitely access some buffers while
planning a query (the most obvious example would be
get_actual_variable_range()), but as far as I can tell those were
previously not accounted for with pg_stat_statements as
queryDesc->totaltime->bufusage is only accumulating buffer usage in
the  executor, and indeed current patch also ignore such computed
counters.

I think it would be better to keep this bufusage calculation during
planning and fix pgss_store() to process them, but this would add
slightly more overhead.

> We only store planning duration in the following pgss_store.
>
> --
> Yoshikazu Imai

RE: Planning counters in pg_stat_statements (using pgss_store)

From

"imai.yoshikazu@fujitsu.com"

Date:

08 November 2019, 04:35:30

On Tue, Sept 10, 2019 at 11:27 PM, Julien Rouhaud wrote:
> > [0002 patch]
> > In pgss_planner_hook:
> >
> > +               /* calc differences of buffer counters. */
> > +               bufusage = compute_buffer_counters(bufusage_start, pgBufferUsage);
> > +
> > +               /*
> > +                * we only store planning duration, query text has been initialized
> > +                * during previous pgss_post_parse_analyze as it not available inside
> > +                * pgss_planner_hook.
> > +                */
> > +               pgss_store(query_text,
> >
> > Do we need to calculate bufusage in here?
> > We only store planning duration in the following pgss_store.
> 
> Good point!  Postgres can definitely access some buffers while
> planning a query (the most obvious example would be
> get_actual_variable_range()), but as far as I can tell those were
> previously not accounted for with pg_stat_statements as
> queryDesc->totaltime->bufusage is only accumulating buffer usage in
> the  executor, and indeed current patch also ignore such computed
> counters.
> 
> I think it would be better to keep this bufusage calculation during
> planning and fix pgss_store() to process them, but this would add
slightly more overhead.

Sorry for my late reply.
I think overhead would be trivial and we can include bufusage of planning from
the POV of overhead, but yeah, it will be backward incompatibility if we
include them.


BTW, ISTM it is good for including {min,max,mean,stddev}_plan_time to
pg_stat_statements. Generally plan_time would be almost the same time in each
execution for the same query, but there are some exceptions. For example, if we
use prepare statements which uses partition tables, time differs largely
between creating a general plan and creating a custom plan.

1. Create partition table which has 1024 partitions.
2. Prepare select and update statements.
  sel) prepare sel(int) as select * from pt where a = $1;
  upd) prepare upd(int, int) as update pt set a = $2 where a = $1;
3. Execute each statement for 8 times.
  3-1. Select from pg_stat_statements view after every execution.
    select query, plans, total_plan_time, calls, total_exec_time from pg_stat_statements where query like 'prepare%';


Results of pg_stat_statements of sel) are
query                       | plans | total_plan_time | calls | total_exec_time 
---------------------------------------------------+-------+-----------------+-------+-----------------
 prepare sel(int) as select * from pt where a = $1 |     1 |        0.164361 |     1 |        0.004613
 prepare sel(int) as select * from pt where a = $1 |     2 | 0.27715500000000004 |     2 |        0.009447
 prepare sel(int) as select * from pt where a = $1 |     3 | 0.39100100000000004 |     3 |        0.014281
 prepare sel(int) as select * from pt where a = $1 |     4 |        0.504004 |     4 |        0.019265
 prepare sel(int) as select * from pt where a = $1 |     5 |        0.628242 |     5 |        0.024091
 prepare sel(int) as select * from pt where a = $1 |     7 | 24.213586000000003 |     6 |        0.029144
 prepare sel(int) as select * from pt where a = $1 |     8 | 24.368900000000004 |     7 |        0.034099
 prepare sel(int) as select * from pt where a = $1 |     9 | 24.527956000000003 |     8 |        0.046152


Results of pg_stat_statements of upd) are
 prepare upd(int, int) as update pt set a = $2 where a = $1 |     1 |        0.280099 |     1 |        0.013138
 prepare upd(int, int) as update pt set a = $2 where a = $1 |     2 |        0.405416 |     2 |         0.01894
 prepare upd(int, int) as update pt set a = $2 where a = $1 |     3 |        0.532361 |     3 |        0.040716
 prepare upd(int, int) as update pt set a = $2 where a = $1 |     4 |        0.671445 |     4 |        0.046566
 prepare upd(int, int) as update pt set a = $2 where a = $1 |     5 |        0.798531 |     5 | 0.052729000000000005
 prepare upd(int, int) as update pt set a = $2 where a = $1 |     7 |      896.915458 |     6 | 0.05888600000000001
 prepare upd(int, int) as update pt set a = $2 where a = $1 |     8 |      897.043512 |     7 |        0.064446
 prepare upd(int, int) as update pt set a = $2 where a = $1 |     9 |      897.169711 |     8 |        0.070644


How do you think about that?


--
Yoshikazu Imai

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

08 November 2019, 14:31:36

On Fri, Nov 8, 2019 at 5:35 AM imai.yoshikazu@fujitsu.com
<imai.yoshikazu@fujitsu.com> wrote:
>
> On Tue, Sept 10, 2019 at 11:27 PM, Julien Rouhaud wrote:
> > > [0002 patch]
> > > In pgss_planner_hook:
> > >
> > > +               /* calc differences of buffer counters. */
> > > +               bufusage = compute_buffer_counters(bufusage_start, pgBufferUsage);
> > > +
> > > +               /*
> > > +                * we only store planning duration, query text has been initialized
> > > +                * during previous pgss_post_parse_analyze as it not available inside
> > > +                * pgss_planner_hook.
> > > +                */
> > > +               pgss_store(query_text,
> > >
> > > Do we need to calculate bufusage in here?
> > > We only store planning duration in the following pgss_store.
> >
> > Good point!  Postgres can definitely access some buffers while
> > planning a query (the most obvious example would be
> > get_actual_variable_range()), but as far as I can tell those were
> > previously not accounted for with pg_stat_statements as
> > queryDesc->totaltime->bufusage is only accumulating buffer usage in
> > the  executor, and indeed current patch also ignore such computed
> > counters.
> >
> > I think it would be better to keep this bufusage calculation during
> > planning and fix pgss_store() to process them, but this would add
> slightly more overhead.
>
> Sorry for my late reply.
> I think overhead would be trivial and we can include bufusage of planning from
> the POV of overhead, but yeah, it will be backward incompatibility if we
> include them.

Ok, let's keep planning's bufusage then.

> BTW, ISTM it is good for including {min,max,mean,stddev}_plan_time to
> pg_stat_statements. Generally plan_time would be almost the same time in each
> execution for the same query, but there are some exceptions. For example, if we
> use prepare statements which uses partition tables, time differs largely
> between creating a general plan and creating a custom plan.
>
> 1. Create partition table which has 1024 partitions.
> 2. Prepare select and update statements.
>   sel) prepare sel(int) as select * from pt where a = $1;
>   upd) prepare upd(int, int) as update pt set a = $2 where a = $1;
> 3. Execute each statement for 8 times.
>   3-1. Select from pg_stat_statements view after every execution.
>     select query, plans, total_plan_time, calls, total_exec_time from pg_stat_statements where query like
'prepare%';
>
>
> Results of pg_stat_statements of sel) are
> query                       | plans | total_plan_time | calls | total_exec_time
> ---------------------------------------------------+-------+-----------------+-------+-----------------
>  prepare sel(int) as select * from pt where a = $1 |     1 |        0.164361 |     1 |        0.004613
>  prepare sel(int) as select * from pt where a = $1 |     2 | 0.27715500000000004 |     2 |        0.009447
>  prepare sel(int) as select * from pt where a = $1 |     3 | 0.39100100000000004 |     3 |        0.014281
>  prepare sel(int) as select * from pt where a = $1 |     4 |        0.504004 |     4 |        0.019265
>  prepare sel(int) as select * from pt where a = $1 |     5 |        0.628242 |     5 |        0.024091
>  prepare sel(int) as select * from pt where a = $1 |     7 | 24.213586000000003 |     6 |        0.029144
>  prepare sel(int) as select * from pt where a = $1 |     8 | 24.368900000000004 |     7 |        0.034099
>  prepare sel(int) as select * from pt where a = $1 |     9 | 24.527956000000003 |     8 |        0.046152
>
>
> Results of pg_stat_statements of upd) are
>  prepare upd(int, int) as update pt set a = $2 where a = $1 |     1 |        0.280099 |     1 |        0.013138
>  prepare upd(int, int) as update pt set a = $2 where a = $1 |     2 |        0.405416 |     2 |         0.01894
>  prepare upd(int, int) as update pt set a = $2 where a = $1 |     3 |        0.532361 |     3 |        0.040716
>  prepare upd(int, int) as update pt set a = $2 where a = $1 |     4 |        0.671445 |     4 |        0.046566
>  prepare upd(int, int) as update pt set a = $2 where a = $1 |     5 |        0.798531 |     5 | 0.052729000000000005
>  prepare upd(int, int) as update pt set a = $2 where a = $1 |     7 |      896.915458 |     6 | 0.05888600000000001
>  prepare upd(int, int) as update pt set a = $2 where a = $1 |     8 |      897.043512 |     7 |        0.064446
>  prepare upd(int, int) as update pt set a = $2 where a = $1 |     9 |      897.169711 |     8 |        0.070644
>
>
> How do you think about that?

That's indeed a very valid point and something we should help user to
investigate.  I'll submit an updated patch with support for
min/max/mean/stddev plan time shortly.

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

09 November 2019, 13:36:20

On Fri, Nov 8, 2019 at 3:31 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Fri, Nov 8, 2019 at 5:35 AM imai.yoshikazu@fujitsu.com
> <imai.yoshikazu@fujitsu.com> wrote:
> >
> > On Tue, Sept 10, 2019 at 11:27 PM, Julien Rouhaud wrote:
> > > > [0002 patch]
> > > > In pgss_planner_hook:
> > > >
> > > > +               /* calc differences of buffer counters. */
> > > > +               bufusage = compute_buffer_counters(bufusage_start, pgBufferUsage);
> > > > +
> > > > +               /*
> > > > +                * we only store planning duration, query text has been initialized
> > > > +                * during previous pgss_post_parse_analyze as it not available inside
> > > > +                * pgss_planner_hook.
> > > > +                */
> > > > +               pgss_store(query_text,
> > > >
> > > > Do we need to calculate bufusage in here?
> > > > We only store planning duration in the following pgss_store.
> > >
> > > Good point!  Postgres can definitely access some buffers while
> > > planning a query (the most obvious example would be
> > > get_actual_variable_range()), but as far as I can tell those were
> > > previously not accounted for with pg_stat_statements as
> > > queryDesc->totaltime->bufusage is only accumulating buffer usage in
> > > the  executor, and indeed current patch also ignore such computed
> > > counters.
> > >
> > > I think it would be better to keep this bufusage calculation during
> > > planning and fix pgss_store() to process them, but this would add
> > slightly more overhead.
> >
> > Sorry for my late reply.
> > I think overhead would be trivial and we can include bufusage of planning from
> > the POV of overhead, but yeah, it will be backward incompatibility if we
> > include them.
>
> Ok, let's keep planning's bufusage then.
>
> > BTW, ISTM it is good for including {min,max,mean,stddev}_plan_time to
> > pg_stat_statements. Generally plan_time would be almost the same time in each
> > execution for the same query, but there are some exceptions. For example, if we
> > use prepare statements which uses partition tables, time differs largely
> > between creating a general plan and creating a custom plan.
> >
> > 1. Create partition table which has 1024 partitions.
> > 2. Prepare select and update statements.
> >   sel) prepare sel(int) as select * from pt where a = $1;
> >   upd) prepare upd(int, int) as update pt set a = $2 where a = $1;
> > 3. Execute each statement for 8 times.
> >   3-1. Select from pg_stat_statements view after every execution.
> >     select query, plans, total_plan_time, calls, total_exec_time from pg_stat_statements where query like
'prepare%';
> >
> >
> > Results of pg_stat_statements of sel) are
> > query                       | plans | total_plan_time | calls | total_exec_time
> > ---------------------------------------------------+-------+-----------------+-------+-----------------
> >  prepare sel(int) as select * from pt where a = $1 |     1 |        0.164361 |     1 |        0.004613
> >  prepare sel(int) as select * from pt where a = $1 |     2 | 0.27715500000000004 |     2 |        0.009447
> >  prepare sel(int) as select * from pt where a = $1 |     3 | 0.39100100000000004 |     3 |        0.014281
> >  prepare sel(int) as select * from pt where a = $1 |     4 |        0.504004 |     4 |        0.019265
> >  prepare sel(int) as select * from pt where a = $1 |     5 |        0.628242 |     5 |        0.024091
> >  prepare sel(int) as select * from pt where a = $1 |     7 | 24.213586000000003 |     6 |        0.029144
> >  prepare sel(int) as select * from pt where a = $1 |     8 | 24.368900000000004 |     7 |        0.034099
> >  prepare sel(int) as select * from pt where a = $1 |     9 | 24.527956000000003 |     8 |        0.046152
> >
> >
> > Results of pg_stat_statements of upd) are
> >  prepare upd(int, int) as update pt set a = $2 where a = $1 |     1 |        0.280099 |     1 |        0.013138
> >  prepare upd(int, int) as update pt set a = $2 where a = $1 |     2 |        0.405416 |     2 |         0.01894
> >  prepare upd(int, int) as update pt set a = $2 where a = $1 |     3 |        0.532361 |     3 |        0.040716
> >  prepare upd(int, int) as update pt set a = $2 where a = $1 |     4 |        0.671445 |     4 |        0.046566
> >  prepare upd(int, int) as update pt set a = $2 where a = $1 |     5 |        0.798531 |     5 |
0.052729000000000005
> >  prepare upd(int, int) as update pt set a = $2 where a = $1 |     7 |      896.915458 |     6 |
0.05888600000000001
> >  prepare upd(int, int) as update pt set a = $2 where a = $1 |     8 |      897.043512 |     7 |        0.064446
> >  prepare upd(int, int) as update pt set a = $2 where a = $1 |     9 |      897.169711 |     8 |        0.070644
> >
> >
> > How do you think about that?
>
> That's indeed a very valid point and something we should help user to
> investigate.  I'll submit an updated patch with support for
> min/max/mean/stddev plan time shortly.

I attach v3 patches implementing those counters.  Note that to avoid
duplicating some code (related to Welford's method), I switched some
of the counters to arrays rather than scalar variables.  It
unfortunately makes pg_stat_statements_internal() a little bit messy,
but I hope that it's still acceptable.  While doing this refactoring I
saw that previous patches were failing to accumulate the buffers used
during planning, which is now fixed.

On Fri, Nov 22, 2019 at 11:23 AM imai.yoshikazu@fujitsu.com
<imai.yoshikazu@fujitsu.com> wrote:
>
> On Wed, Nov 20, 2019 at 4:55 PM, Julien Rouhaud wrote:
> > On Wed, Nov 20, 2019 at 2:06 AM imai.yoshikazu@fujitsu.com <imai.yoshikazu@fujitsu.com> wrote:
> > >
> > > On Tue, Nov 19, 2019 at 2:27 PM, Julien Rouhaud wrote:
> > > > On Fri, Nov 15, 2019 at 2:00 AM imai.yoshikazu@fujitsu.com <imai.yoshikazu@fujitsu.com> wrote:
> > > > >
> > > > > Actually I also don't have strong opinion but I thought someone
> > > > > would complain about renaming of those columns and
> > > > also some tools like monitoring which use those columns will not
> > > > work. If we use {total, min, max, mean, stddev}_time, someone might
> > > > mistakenly understand {total, min, max, mean, stddev}_time mean {total, min, max, mean, stddev} of planning
and
> > execution.
> > > > > If I need to choose {total, min, max, mean, stddev}_time or
> > > > > {total, min, max, mean, stddev}_exec_time, I choose former
> > > > one because choosing best name is not worth destructing the existing scripts or tools.
> > > >
> > > > We could definitely keep (plan|exec)_time for the SRF, and have the
> > > > {total, min, max, mean, stddev}_time created by the view to be a sum
> > > > of planning + execution for each counter
> > >
> > > I might misunderstand but if we define {total, min, max, mean,
> > > stddev}_time is just a sum of planning + execution for each counter
> > > like "select total_plan_time + total_exec_time as total_time from
> > > pg_stat_statements", I wonder we can calculate stddev_time correctly.
> > > If we prepare variables in the codes to calculate those values, yes,
> > > we can correctly calculate those values even for the total_stddev.
> >
> > Yes you're right, this can't possibly work for most of the counters.
> > And also, since there's no guarantee that each execution will follow a planning, providing such global counters
for
> > min/max/mean and stddev wouldn't make much sense.
>
> Ah, I see. Planning counts and execution counts differ.
> It might be difficult to redefine the meaning of {min, max, mean, stddev}_time precisely, and even if we can redefine
itcorrectly, it would not be intuitive.
 

Thomas' automatic patch tester just warned me that the patchset is
broken since 3fd40b628c7db4, which removed the queryString from
ExecCreateTableAs.  New patch version that re-add the queryString, no
changes otherwise.

Hi,

On Sat, Jan 18, 2020 at 6:14 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:
>
> Hi Julien,
>
> bot is still unhappy
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/638701399
>
> portalcmds.c: In function ‘PerformCursorOpen’:
> portalcmds.c:93:7: error: ‘queryString’ may be used uninitialized in this
> function [-Werror=maybe-uninitialized]
>   plan = pg_plan_query(query, queryString, cstmt->options, params);
>        ^
> portalcmds.c:50:8: note: ‘queryString’ was declared here
>   char    *queryString;
>         ^
> cc1: all warnings being treated as errors
> <builtin>: recipe for target 'portalcmds.o' failed
> make[3]: *** [portalcmds.o] Error 1
> make[3]: Leaving directory
> '/home/travis/build/postgresql-cfbot/postgresql/src/backend/commands'
> common.mk:39: recipe for target 'commands-recursive' failed
> make[2]: *** [commands-recursive] Error 2
> make[2]: *** Waiting for unfinished jobs....

Indeed, thanks for the report!  PFA rebased v4 version of the patchset.

On Thu, Mar 05, 2020 at 01:26:19PM -0700, legrand legrand wrote:
> Please consider PG13 shortest path ;o)
>
> My one is  parse->queryId != UINT64CONST(0) in pgss_planner_hook().
> It fixes IVM problem (verified),
> and keep CTAS equal to pgss without planning counters (verified too).

I still disagree that hiding this problem is the right fix, but since no one
objected here's a v5 with that behavior.  Hopefully this will be fixed in v14.

Attachment

RE: Planning counters in pg_stat_statements (using pgss_store)

From

"imai.yoshikazu@fujitsu.com"

Date:

12 March 2020, 05:28:38

Hi Julien,

On Mon, Mar 9, 2020 at 10:32 AM, Julien Rouhaud wrote:
> On Thu, Mar 05, 2020 at 01:26:19PM -0700, legrand legrand wrote:
> > Please consider PG13 shortest path ;o)
> >
> > My one is  parse->queryId != UINT64CONST(0) in pgss_planner_hook().
> > It fixes IVM problem (verified),
> > and keep CTAS equal to pgss without planning counters (verified too).
>
> I still disagree that hiding this problem is the right fix, but since no one
> objected here's a v5 with that behavior.  Hopefully this will be fixed in v14.

Is there any case that query_text will be NULL when executing pg_plan_query?
If query_text will be NULL, we need to add codes to avoid errors in
pgss_store like as current patch. If query_text will not be NULL, we should
add Assert in pg_plan_query so that other developers can notice that they
would not mistakenly set query_text as NULL even without using pgss_planning
counter.

--
Yoshikazu Imai

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

12 March 2020, 06:31:09

Hi Imai-san,

On Thu, Mar 12, 2020 at 05:28:38AM +0000, imai.yoshikazu@fujitsu.com wrote:
> Hi Julien,
>
> On Mon, Mar 9, 2020 at 10:32 AM, Julien Rouhaud wrote:
> > On Thu, Mar 05, 2020 at 01:26:19PM -0700, legrand legrand wrote:
> > > Please consider PG13 shortest path ;o)
> > >
> > > My one is  parse->queryId != UINT64CONST(0) in pgss_planner_hook().
> > > It fixes IVM problem (verified),
> > > and keep CTAS equal to pgss without planning counters (verified too).
> >
> > I still disagree that hiding this problem is the right fix, but since no one
> > objected here's a v5 with that behavior.  Hopefully this will be fixed in v14.
>
> Is there any case that query_text will be NULL when executing pg_plan_query?

With current sources, there are no cases where the query text isn't provided
AFAICS.

> If query_text will be NULL, we need to add codes to avoid errors in
> pgss_store like as current patch. If query_text will not be NULL, we should
> add Assert in pg_plan_query so that other developers can notice that they
> would not mistakenly set query_text as NULL even without using pgss_planning
> counter.

I totally agree.  I already had such assert locally, and regression tests pass
without any problem.  I'm attaching a v6 with that extra assert.  If the
first patch is committed, it'll now be a requirement to provide it.  Or if
people think it's not, it'll make sure that we'll discuss it.

On Sat, Mar 14, 2020 at 03:04:00AM -0700, legrand legrand wrote:
> imai.yoshikazu@fujitsu.com wrote
> > On Thu, Mar 12, 2020 at 6:37 PM, Julien Rouhaud wrote:
> >> That's very interesting feedback, thanks!  I'm not a fan of giving a way
> >> for
> >> queries to say that they want to be ignored by pg_stat_statements, but
> >> double
> >> counting the planning counters seem even worse, so I'm +0.5 to accept
> >> NULL
> >> query string in the planner, incidentally making pgss ignore those.
> >
> > It is preferable that we can count various queries statistics as much as
> > possible
> > but if it causes overhead even when without using pg_stat_statements, we
> > would
> > not have to force them to set appropriate query_text.
> > About settings a fixed string in query_text, I think it doesn't make much
> > sense
> > because users can't take any actions by seeing those queries' stats.
> > Moreover, if
> > we set a fixed string in query_text to avoid pg_stat_statement's errors,
> > codes
> > would be inexplicable for other developers who don't know there's such
> > requirements.
> > After all, I agree accepting NULL query string in the planner.
> >
> > I don't know it is useful but there are also codes that avoid an error
> > when
> > sourceText is NULL.
> >
> > executor_errposition(EState *estate, int location)
> > {
> >     ...
> >     /* Can't do anything if source text is not available */
> >     if (estate == NULL || estate->es_sourceText == NULL)


I'm wondering if that's really possible.  But pgss uses the QueryDesc, which
should always have a query text (since pgss relies on that).


> My understanding of V5 patch, that checks for Non-Zero queryid,
> manage properly case where sourceText is NULL.
>
> A NULL sourceText means that there was no Parsing for the associated
> query, if there was no Parsing, there is no queryid (queryId=0),
> and no planning counters update.
>
> It doesn't change pg_plan_query behaviour (no regression for Citus, IVM,
> ...),
> and was tested with success for IVM.
>
> If my understanding is wrong, then setting track_planning = false
> would still be the work arround for the very rare (no core) extension(s)
> that may hit the NULL query text assertion failure.
>
> What do you think about this ?


I don't think that's a correct assumption.  I obviously didn't read all of
citus extension, but it looks like what's happening is that they get generate a
custom Query from the original one, with all the modification needed for
distributed execution and whatnot, which is then fed to the planner.  I think
it's entirely mossible that the modified Query herits from a previously set
queryid, while still not really having a query text.  And if citus doesn't do
that, it doesn't seem like an illegal use cuse anyway.

I'm instead attaching a v7 which removes the assert in pg_plan_query, and
modify pgss_planner_hook to also ignore queries without a query text, as this
seems the best option.

On Mon, Mar 16, 2020 at 01:34:11AM +0000, imai.yoshikazu@fujitsu.com wrote:
> On Sat, Mar 14, 2020 at 5:28 PM, Julien Rouhaud wrote:
> > I don't think that's a correct assumption.  I obviously didn't read all of
> > citus extension, but it looks like what's happening is that they get generate a
> > custom Query from the original one, with all the modification needed for
> > distributed execution and whatnot, which is then fed to the planner.  I think
> > it's entirely mossible that the modified Query herits from a previously set
> > queryid, while still not really having a query text.  And if citus doesn't do
> > that, it doesn't seem like an illegal use cuse anyway.
>
> Indeed. It can happen that queryid has some value while query_text is NULL.
>
>
> > I'm instead attaching a v7 which removes the assert in pg_plan_query, and
> > modify pgss_planner_hook to also ignore queries without a query text, as this
> > seems the best option.
>
> Thank you.
> It also seems to me that is the best option.


Thanks Imai-san and PAscal for the feedback, it seems that we have an
agreement!


> BTW, I recheck the patchset.
> I think codes are ready for committer but should we modify the documentation?
> {min,max,mean,stddev}_time is now obsoleted so it is better to modify it to
> {min,max,mean,stddev}_exec_time and add {min,max,mean,stddev}_plan_time.


Oh indeed, I totally forgot about this.  I'm attaching v8 with updated
documentation that should match what was implemented since some versions.

On Fri, Mar 20, 2020 at 05:09:05PM +0300, Sergei Kornilov wrote:
> Hello
> 
> Yet another is missed in docs: total_time

Oh good catch!  I rechecked many time the field, and totally missed that the
documentation is referring to the view, which has an additional column, and not
the function.  Attached v9 fixes that.

> I specifically verified that the new loaded library works with the old version of the extension in the database. I
havenot noticed issues here.
 

Thanks for those extra checks.

> > 2.2% is a bit large but I think it is still acceptable because people using this feature
> > might take account that some overhead will happen for additional calling of a
> > gettime function.
> 
> I will be happy even with 10% overhead due to enabled track_planning... (but in this case disabled by default)
log_min_duration_statement= 0 with log parsing is much more expensive.
 
> I think 1-2% is acceptable and we can set track_planning = on by default as patch does.
> 
> > * Rows for executor time are changed from {total/min/max/mean/stddev}_time to
{total/min/max/mean/stddev}_exec_time.
> 
> Maybe release it as 2.0 version instead of minor update 1.18?

I don't have an opinion on that, I'd be fine with any version.  I kept 1.18 in
the patch for now.

On Thu, Mar 26, 2020 at 02:22:47PM +0100, Julien Rouhaud wrote:
> On Thu, Mar 26, 2020 at 08:08:35PM +0900, Fujii Masao wrote:
> > 
> > Here are other comments.
> > 
> > -        if (jstate)
> > +        if (kind == PGSS_JUMBLE)
> > 
> > Why is PGSS_JUMBLE necessary? ISTM that we can still use jstate here, instead.
> > 
> > If it's ok to remove PGSS_JUMBLE, we can define PGSS_NUM_KIND(=2) instead
> > and replace 2 in, e.g., total_time[2] with PGSS_NUM_KIND. Thought?
> 
> Yes, we could be using jstate here.  I originally used that to avoid passing
> PGSS_EXEC (or the other one) as a way to say "ignore this information as
> there's the jstate which says it's yet another meaning".  If that's not an
> issue, I can change that as PGSS_NUM_KIND will clearly improve the explicit "2"
> all over the place.

Done, passing PGSS_PLAN when jumble is intended, with a comment saying that the
pgss_kind is ignored in that case.

> > +      <entry><structfield>total_time</structfield></entry>
> > +      <entry><type>double precision</type></entry>
> > +      <entry></entry>
> > +      <entry>
> > +        Total time spend planning and executing the statement, in milliseconds
> > +      </entry>
> > +     </row>
> > 
> > pg_stat_statements view has this column but the function not.
> > We should make both have the column or not at all, for consistency?
> > I'm not sure if it's good thing to expose the sum of total_plan_time
> > and total_exec_time as total_time. If some users want that, they can
> > easily calculate it from total_plan_time and total_exec_time by using
> > their own logic.
> 
> I think we originally added it as a way to avoid too much compatibility break,
> and also because it seems like a field most users will be interested in anyway.
> Now that I'm thinking about it again, I indeed think it was a mistake to have
> that in view part only.  Not mainly for consistency, but for users who would be
> interested in the total_time field while not wanting to pay the overhead of
> retrieving the query text if they don't need it.  So I'll change that!

Done

> > +        nested_level++;
> > +        PG_TRY();
> > 
> > In old thread [1], Tom Lane commented the usage of nested_level
> > in the planner hook. There seems no reply to that so far. What's
> > your opinion about that comment?
> > 
> > [1] https://www.postgresql.org/message-id/28980.1515803777@sss.pgh.pa.us
> 
> Oh thanks, I didn't noticed this part of the discussion.  I agree with Tom's
> concern, and I think that having a specific nesting level variable for the
> planner is the best workaround, so I'll implement that.

Done.

I also exported BufferUsageAccumDiff as mentioned previously, as it seems
clearner and will avoid future useless code churn, and run pgindent.

v10 attached.

Attachment

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Fujii Masao

Date:

27 March 2020, 11:02:15


On 2020/03/27 19:00, Julien Rouhaud wrote:
> On Thu, Mar 26, 2020 at 02:22:47PM +0100, Julien Rouhaud wrote:
>> On Thu, Mar 26, 2020 at 08:08:35PM +0900, Fujii Masao wrote:
>>>
>>> Here are other comments.
>>>
>>> -        if (jstate)
>>> +        if (kind == PGSS_JUMBLE)
>>>
>>> Why is PGSS_JUMBLE necessary? ISTM that we can still use jstate here, instead.
>>>
>>> If it's ok to remove PGSS_JUMBLE, we can define PGSS_NUM_KIND(=2) instead
>>> and replace 2 in, e.g., total_time[2] with PGSS_NUM_KIND. Thought?
>>
>> Yes, we could be using jstate here.  I originally used that to avoid passing
>> PGSS_EXEC (or the other one) as a way to say "ignore this information as
>> there's the jstate which says it's yet another meaning".  If that's not an
>> issue, I can change that as PGSS_NUM_KIND will clearly improve the explicit "2"
>> all over the place.
> 
> Done, passing PGSS_PLAN when jumble is intended, with a comment saying that the
> pgss_kind is ignored in that case.
> 
>>> +      <entry><structfield>total_time</structfield></entry>
>>> +      <entry><type>double precision</type></entry>
>>> +      <entry></entry>
>>> +      <entry>
>>> +        Total time spend planning and executing the statement, in milliseconds
>>> +      </entry>
>>> +     </row>
>>>
>>> pg_stat_statements view has this column but the function not.
>>> We should make both have the column or not at all, for consistency?
>>> I'm not sure if it's good thing to expose the sum of total_plan_time
>>> and total_exec_time as total_time. If some users want that, they can
>>> easily calculate it from total_plan_time and total_exec_time by using
>>> their own logic.
>>
>> I think we originally added it as a way to avoid too much compatibility break,
>> and also because it seems like a field most users will be interested in anyway.
>> Now that I'm thinking about it again, I indeed think it was a mistake to have
>> that in view part only.  Not mainly for consistency, but for users who would be
>> interested in the total_time field while not wanting to pay the overhead of
>> retrieving the query text if they don't need it.  So I'll change that!
> 
> Done
> 
>>> +        nested_level++;
>>> +        PG_TRY();
>>>
>>> In old thread [1], Tom Lane commented the usage of nested_level
>>> in the planner hook. There seems no reply to that so far. What's
>>> your opinion about that comment?
>>>
>>> [1] https://www.postgresql.org/message-id/28980.1515803777@sss.pgh.pa.us
>>
>> Oh thanks, I didn't noticed this part of the discussion.  I agree with Tom's
>> concern, and I think that having a specific nesting level variable for the
>> planner is the best workaround, so I'll implement that.
> 
> Done.
> 
> I also exported BufferUsageAccumDiff as mentioned previously, as it seems
> clearner and will avoid future useless code churn, and run pgindent.
> 
> v10 attached.

Thanks for updating the patches!

Regarding 0001 patch, I have one nitpicking comment;

-        result = standard_planner(parse, cursorOptions, boundParams);
+        result = standard_planner(parse, query_text, cursorOptions, boundParams);

-standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
+standard_planner(Query *parse, const char *querytext, int cursorOptions,
+                 ParamListInfo boundParams)

-pg_plan_query(Query *querytree, int cursorOptions, ParamListInfo boundParams)
+pg_plan_query(Query *querytree, const char *query_text, int cursorOptions,
+              ParamListInfo boundParams)

The patch uses "query_text" and "querytext" as the name of newly-added
argument. They should be unified? IMO "query_string" looks better name
because it's used in other functions like pg_analyze_and_rewrite(),
pg_parse_query() for the sake of consistency. Thought?

Regards,


-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Fujii Masao

Date:

27 March 2020, 13:01:40


On 2020/03/27 19:00, Julien Rouhaud wrote:
> On Thu, Mar 26, 2020 at 02:22:47PM +0100, Julien Rouhaud wrote:
>> On Thu, Mar 26, 2020 at 08:08:35PM +0900, Fujii Masao wrote:
>>>
>>> Here are other comments.
>>>
>>> -        if (jstate)
>>> +        if (kind == PGSS_JUMBLE)
>>>
>>> Why is PGSS_JUMBLE necessary? ISTM that we can still use jstate here, instead.
>>>
>>> If it's ok to remove PGSS_JUMBLE, we can define PGSS_NUM_KIND(=2) instead
>>> and replace 2 in, e.g., total_time[2] with PGSS_NUM_KIND. Thought?
>>
>> Yes, we could be using jstate here.  I originally used that to avoid passing
>> PGSS_EXEC (or the other one) as a way to say "ignore this information as
>> there's the jstate which says it's yet another meaning".  If that's not an
>> issue, I can change that as PGSS_NUM_KIND will clearly improve the explicit "2"
>> all over the place.
> 
> Done, passing PGSS_PLAN when jumble is intended, with a comment saying that the
> pgss_kind is ignored in that case.
> 
>>> +      <entry><structfield>total_time</structfield></entry>
>>> +      <entry><type>double precision</type></entry>
>>> +      <entry></entry>
>>> +      <entry>
>>> +        Total time spend planning and executing the statement, in milliseconds
>>> +      </entry>
>>> +     </row>
>>>
>>> pg_stat_statements view has this column but the function not.
>>> We should make both have the column or not at all, for consistency?
>>> I'm not sure if it's good thing to expose the sum of total_plan_time
>>> and total_exec_time as total_time. If some users want that, they can
>>> easily calculate it from total_plan_time and total_exec_time by using
>>> their own logic.
>>
>> I think we originally added it as a way to avoid too much compatibility break,
>> and also because it seems like a field most users will be interested in anyway.
>> Now that I'm thinking about it again, I indeed think it was a mistake to have
>> that in view part only.  Not mainly for consistency, but for users who would be
>> interested in the total_time field while not wanting to pay the overhead of
>> retrieving the query text if they don't need it.  So I'll change that!
> 
> Done
> 

Thanks for updating the patch! But I'm still wondering if it's really
good thing to expose total_time. For example, when the query fails
with an error many times and "calls" becomes very different from
"plans", "total_plan_time" + "total_exec_time" is really what the users
are interested in? Some users may be interested in the sum of mean
times, but it's not exposed...

So what I'd like to say is that the information that users are interested
in would vary on each situation and case. At least for me it seems
enough for pgss to report only the basic information. Then users
can calculate to get the numbers (like total_time) they're interested in,
from those basic information.

But of course, I'd like to hear more opinions about this...

+        if (api_version >= PGSS_V1_8)
+            values[i++] = Int64GetDatumFast(tmp.total_time[0] +
+                                            tmp.total_time[1]);

BTW, Int64GetDatumFast() should be Float8GetDatumFast()?

>>> +        nested_level++;
>>> +        PG_TRY();
>>>
>>> In old thread [1], Tom Lane commented the usage of nested_level
>>> in the planner hook. There seems no reply to that so far. What's
>>> your opinion about that comment?
>>>
>>> [1] https://www.postgresql.org/message-id/28980.1515803777@sss.pgh.pa.us
>>
>> Oh thanks, I didn't noticed this part of the discussion.  I agree with Tom's
>> concern, and I think that having a specific nesting level variable for the
>> planner is the best workaround, so I'll implement that.
> 
> Done.
> 
> I also exported BufferUsageAccumDiff as mentioned previously, as it seems
> clearner and will avoid future useless code churn, and run pgindent.

Many thanks!! I'm thinking to commit this part separately.
So I made that patch based on your patch. Attached.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Attachment

expose_BufferUsageAccumDiff_v1.patch

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

27 March 2020, 14:27:13

On Fri, Mar 27, 2020 at 12:02 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
>
> On 2020/03/27 19:00, Julien Rouhaud wrote:
> > On Thu, Mar 26, 2020 at 02:22:47PM +0100, Julien Rouhaud wrote:
> >> On Thu, Mar 26, 2020 at 08:08:35PM +0900, Fujii Masao wrote:
> >>>
> >>> Here are other comments.
> >>>
> >>> -           if (jstate)
> >>> +           if (kind == PGSS_JUMBLE)
> >>>
> >>> Why is PGSS_JUMBLE necessary? ISTM that we can still use jstate here, instead.
> >>>
> >>> If it's ok to remove PGSS_JUMBLE, we can define PGSS_NUM_KIND(=2) instead
> >>> and replace 2 in, e.g., total_time[2] with PGSS_NUM_KIND. Thought?
> >>
> >> Yes, we could be using jstate here.  I originally used that to avoid passing
> >> PGSS_EXEC (or the other one) as a way to say "ignore this information as
> >> there's the jstate which says it's yet another meaning".  If that's not an
> >> issue, I can change that as PGSS_NUM_KIND will clearly improve the explicit "2"
> >> all over the place.
> >
> > Done, passing PGSS_PLAN when jumble is intended, with a comment saying that the
> > pgss_kind is ignored in that case.
> >
> >>> +      <entry><structfield>total_time</structfield></entry>
> >>> +      <entry><type>double precision</type></entry>
> >>> +      <entry></entry>
> >>> +      <entry>
> >>> +        Total time spend planning and executing the statement, in milliseconds
> >>> +      </entry>
> >>> +     </row>
> >>>
> >>> pg_stat_statements view has this column but the function not.
> >>> We should make both have the column or not at all, for consistency?
> >>> I'm not sure if it's good thing to expose the sum of total_plan_time
> >>> and total_exec_time as total_time. If some users want that, they can
> >>> easily calculate it from total_plan_time and total_exec_time by using
> >>> their own logic.
> >>
> >> I think we originally added it as a way to avoid too much compatibility break,
> >> and also because it seems like a field most users will be interested in anyway.
> >> Now that I'm thinking about it again, I indeed think it was a mistake to have
> >> that in view part only.  Not mainly for consistency, but for users who would be
> >> interested in the total_time field while not wanting to pay the overhead of
> >> retrieving the query text if they don't need it.  So I'll change that!
> >
> > Done
> >
> >>> +           nested_level++;
> >>> +           PG_TRY();
> >>>
> >>> In old thread [1], Tom Lane commented the usage of nested_level
> >>> in the planner hook. There seems no reply to that so far. What's
> >>> your opinion about that comment?
> >>>
> >>> [1] https://www.postgresql.org/message-id/28980.1515803777@sss.pgh.pa.us
> >>
> >> Oh thanks, I didn't noticed this part of the discussion.  I agree with Tom's
> >> concern, and I think that having a specific nesting level variable for the
> >> planner is the best workaround, so I'll implement that.
> >
> > Done.
> >
> > I also exported BufferUsageAccumDiff as mentioned previously, as it seems
> > clearner and will avoid future useless code churn, and run pgindent.
> >
> > v10 attached.
>
> Thanks for updating the patches!
>
> Regarding 0001 patch, I have one nitpicking comment;
>
> -               result = standard_planner(parse, cursorOptions, boundParams);
> +               result = standard_planner(parse, query_text, cursorOptions, boundParams);
>
> -standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
> +standard_planner(Query *parse, const char *querytext, int cursorOptions,
> +                                ParamListInfo boundParams)
>
> -pg_plan_query(Query *querytree, int cursorOptions, ParamListInfo boundParams)
> +pg_plan_query(Query *querytree, const char *query_text, int cursorOptions,
> +                         ParamListInfo boundParams)
>
> The patch uses "query_text" and "querytext" as the name of newly-added
> argument. They should be unified? IMO "query_string" looks better name
> because it's used in other functions like pg_analyze_and_rewrite(),
> pg_parse_query() for the sake of consistency. Thought?

Indeed, and +1 for query_text.

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

27 March 2020, 14:42:50

On Fri, Mar 27, 2020 at 2:01 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> On 2020/03/27 19:00, Julien Rouhaud wrote:
> > On Thu, Mar 26, 2020 at 02:22:47PM +0100, Julien Rouhaud wrote:
> >> On Thu, Mar 26, 2020 at 08:08:35PM +0900, Fujii Masao wrote:
> >>>
> >>> Here are other comments.
> >>>
> >>> -           if (jstate)
> >>> +           if (kind == PGSS_JUMBLE)
> >>>
> >>> Why is PGSS_JUMBLE necessary? ISTM that we can still use jstate here, instead.
> >>>
> >>> If it's ok to remove PGSS_JUMBLE, we can define PGSS_NUM_KIND(=2) instead
> >>> and replace 2 in, e.g., total_time[2] with PGSS_NUM_KIND. Thought?
> >>
> >> Yes, we could be using jstate here.  I originally used that to avoid passing
> >> PGSS_EXEC (or the other one) as a way to say "ignore this information as
> >> there's the jstate which says it's yet another meaning".  If that's not an
> >> issue, I can change that as PGSS_NUM_KIND will clearly improve the explicit "2"
> >> all over the place.
> >
> > Done, passing PGSS_PLAN when jumble is intended, with a comment saying that the
> > pgss_kind is ignored in that case.
> >
> >>> +      <entry><structfield>total_time</structfield></entry>
> >>> +      <entry><type>double precision</type></entry>
> >>> +      <entry></entry>
> >>> +      <entry>
> >>> +        Total time spend planning and executing the statement, in milliseconds
> >>> +      </entry>
> >>> +     </row>
> >>>
> >>> pg_stat_statements view has this column but the function not.
> >>> We should make both have the column or not at all, for consistency?
> >>> I'm not sure if it's good thing to expose the sum of total_plan_time
> >>> and total_exec_time as total_time. If some users want that, they can
> >>> easily calculate it from total_plan_time and total_exec_time by using
> >>> their own logic.
> >>
> >> I think we originally added it as a way to avoid too much compatibility break,
> >> and also because it seems like a field most users will be interested in anyway.
> >> Now that I'm thinking about it again, I indeed think it was a mistake to have
> >> that in view part only.  Not mainly for consistency, but for users who would be
> >> interested in the total_time field while not wanting to pay the overhead of
> >> retrieving the query text if they don't need it.  So I'll change that!
> >
> > Done
> >
>
> Thanks for updating the patch! But I'm still wondering if it's really
> good thing to expose total_time. For example, when the query fails
> with an error many times and "calls" becomes very different from
> "plans", "total_plan_time" + "total_exec_time" is really what the users
> are interested in?

That's also the case when running explain without analyze, or prepared
statements that fallback to generic plans.  As a user, knowing how
long postgres actually spent processing a query is interesting as a
way to find likely low hanging fruits, even if there's no
planning/execution strict correlation.  The planning/execution detail
is also useful but that's probably not what I'd be starting from (at
least in OLTP workload).

The error scenario is unfortunate, but that's yet another topic.

> Some users may be interested in the sum of mean
> times, but it's not exposed...

Yes, we had a discussion about summing the other fields, but it seems
to me that doing a sum of computed fields doesn't really make sense.
Mean without variance is already not that useful.

> So what I'd like to say is that the information that users are interested
> in would vary on each situation and case. At least for me it seems
> enough for pgss to report only the basic information. Then users
> can calculate to get the numbers (like total_time) they're interested in,
> from those basic information.
>
> But of course, I'd like to hear more opinions about this...

+1

Unless someone chime in by tomorrow, I'll just drop the sum as it
seems less controversial and not a blocker in userland if users are
interested.

>
> +               if (api_version >= PGSS_V1_8)
> +                       values[i++] = Int64GetDatumFast(tmp.total_time[0] +
> +                                                                                       tmp.total_time[1]);
>
> BTW, Int64GetDatumFast() should be Float8GetDatumFast()?

Oh indeed, embarrassing copy/pasto.

>
> >>> +           nested_level++;
> >>> +           PG_TRY();
> >>>
> >>> In old thread [1], Tom Lane commented the usage of nested_level
> >>> in the planner hook. There seems no reply to that so far. What's
> >>> your opinion about that comment?
> >>>
> >>> [1] https://www.postgresql.org/message-id/28980.1515803777@sss.pgh.pa.us
> >>
> >> Oh thanks, I didn't noticed this part of the discussion.  I agree with Tom's
> >> concern, and I think that having a specific nesting level variable for the
> >> planner is the best workaround, so I'll implement that.
> >
> > Done.
> >
> > I also exported BufferUsageAccumDiff as mentioned previously, as it seems
> > clearner and will avoid future useless code churn, and run pgindent.
>
> Many thanks!! I'm thinking to commit this part separately.
> So I made that patch based on your patch. Attached.

Thanks! It looks good to me.

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

29 March 2020, 06:15:49

On Fri, Mar 27, 2020 at 03:42:50PM +0100, Julien Rouhaud wrote:
> On Fri, Mar 27, 2020 at 2:01 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >
> 
> > So what I'd like to say is that the information that users are interested
> > in would vary on each situation and case. At least for me it seems
> > enough for pgss to report only the basic information. Then users
> > can calculate to get the numbers (like total_time) they're interested in,
> > from those basic information.
> >
> > But of course, I'd like to hear more opinions about this...
> 
> +1
> 
> Unless someone chime in by tomorrow, I'll just drop the sum as it
> seems less controversial and not a blocker in userland if users are
> interested.

Done in attached v11, with also the s/querytext/query_text/ discrepancy noted
previously.

> > >
> > > I also exported BufferUsageAccumDiff as mentioned previously, as it seems
> > > clearner and will avoid future useless code churn, and run pgindent.
> >
> > Many thanks!! I'm thinking to commit this part separately.
> > So I made that patch based on your patch. Attached.
> 
> Thanks! It looks good to me.

I also kept that part in a distinct commit for convenience.

On Tue, Mar 31, 2020 at 04:10:47PM +0900, Fujii Masao wrote:
> 
> 
> On 2020/03/31 15:03, Julien Rouhaud wrote:
> > On Tue, Mar 31, 2020 at 12:21:43PM +0900, Fujii Masao wrote:
> > > 
> > > On 2020/03/31 3:16, Julien Rouhaud wrote:
> > > > On Mon, Mar 30, 2020 at 6:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > > > > 
> > > > > While testing the patched pgss, I found that the patched version
> > > > > may track the statements that the original version doesn't.
> > > > > Please imagine the case where the following queries are executed,
> > > > > with pgss.track = top.
> > > > > 
> > > > >        PREPARE hoge AS SELECT * FROM t;
> > > > >        EXPLAIN EXECUTE hoge;
> > > > > 
> > > > > The pgss view returned "PREPARE hoge AS SELECT * FROM t"
> > > > > in the patched version, but not in the orignal version.
> > > > > 
> > > > > Is this problematic?
> > > > 
> > > > Oh indeed. That's a side effect of having different the executed query
> > > > and the planned query being different.
> > > > 
> > > > I guess the question is to chose if the top level executed query of a
> > > > utilty statement containing an optimisable query, should the top level
> > > > planner call of that optimisable statement be considered at top level
> > > > or not.  I tend to think that's the correct behavior here, as this is
> > > > also what would happen if a regular DML was provided.  What do you
> > > > think?
> > > 
> > > TBH, not sure if that's ok yet...
> > > 
> > > I'm now just wondering if both plan_nested_level and
> > > exec_nested_level should be incremented in pgss_ProcessUtility().
> > > This is just a guess, so I need more investigation about this.
> > 
> > Yeah, after a second thought I realize that my comparison was wrong.  Allowing
> > *any* top-level planner call when pgss.track = top would mean that we should
> > also consider all planner calls from queries executed for FK checks and such,
> > which is definitely not the intended behavior.
> 
> Yes. So, basically any planner activity that happens during
> the execution phase of the statement is not tracked.
> 
> > FTR with this patch such calls still don't get tracked, but only because those
> > query don't get a queryid assigned, not because the nesting level says so.
> > 
> > How about simply passing (plan_nested_level + exec_nested_level) for
> > pgss_enabled call in pgss_planner_hook?
> 
> Looks good to me! The comment about why this treatment is necessary only in
> pgss_planner() should be added.


Yes of course.  It also requires some changes in the macro to make it safe when
called with an expression.

v12 attached!

Attachment

v12-0001-Add-planning-counters-to-pg_stat_statements.patch

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Fujii Masao

Date:

31 March 2020, 17:43:10


On 2020/03/31 16:33, Julien Rouhaud wrote:
> On Tue, Mar 31, 2020 at 04:10:47PM +0900, Fujii Masao wrote:
>>
>>
>> On 2020/03/31 15:03, Julien Rouhaud wrote:
>>> On Tue, Mar 31, 2020 at 12:21:43PM +0900, Fujii Masao wrote:
>>>>
>>>> On 2020/03/31 3:16, Julien Rouhaud wrote:
>>>>> On Mon, Mar 30, 2020 at 6:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>>>
>>>>>> While testing the patched pgss, I found that the patched version
>>>>>> may track the statements that the original version doesn't.
>>>>>> Please imagine the case where the following queries are executed,
>>>>>> with pgss.track = top.
>>>>>>
>>>>>>         PREPARE hoge AS SELECT * FROM t;
>>>>>>         EXPLAIN EXECUTE hoge;
>>>>>>
>>>>>> The pgss view returned "PREPARE hoge AS SELECT * FROM t"
>>>>>> in the patched version, but not in the orignal version.
>>>>>>
>>>>>> Is this problematic?
>>>>>
>>>>> Oh indeed. That's a side effect of having different the executed query
>>>>> and the planned query being different.
>>>>>
>>>>> I guess the question is to chose if the top level executed query of a
>>>>> utilty statement containing an optimisable query, should the top level
>>>>> planner call of that optimisable statement be considered at top level
>>>>> or not.  I tend to think that's the correct behavior here, as this is
>>>>> also what would happen if a regular DML was provided.  What do you
>>>>> think?
>>>>
>>>> TBH, not sure if that's ok yet...
>>>>
>>>> I'm now just wondering if both plan_nested_level and
>>>> exec_nested_level should be incremented in pgss_ProcessUtility().
>>>> This is just a guess, so I need more investigation about this.
>>>
>>> Yeah, after a second thought I realize that my comparison was wrong.  Allowing
>>> *any* top-level planner call when pgss.track = top would mean that we should
>>> also consider all planner calls from queries executed for FK checks and such,
>>> which is definitely not the intended behavior.
>>
>> Yes. So, basically any planner activity that happens during
>> the execution phase of the statement is not tracked.
>>
>>> FTR with this patch such calls still don't get tracked, but only because those
>>> query don't get a queryid assigned, not because the nesting level says so.
>>>
>>> How about simply passing (plan_nested_level + exec_nested_level) for
>>> pgss_enabled call in pgss_planner_hook?
>>
>> Looks good to me! The comment about why this treatment is necessary only in
>> pgss_planner() should be added.
> 
> 
> Yes of course.  It also requires some changes in the macro to make it safe when
> called with an expression.
> 
> v12 attached!

Thanks for updating the patch! The patch looks good to me.

I applied minor and cosmetic changes into the patch. Attached is
the updated version of the patch. Barring any objection, I'd like to
commit this version.

BTW, the minor and cosmetic changes that I applied are, for example,

- Rename pgss_planner_hook to pgss_planner for the sake of consistency.
    Other function using hook in pgss doesn't use "_hook" in their names, too.
- Make pgss_planner use PG_FINALLY() instead of PG_CATCH().
- Make PGSS_NUMKIND as the last value in enum pgssStoreKind.
- Update the sample output in the document.
etc

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Attachment

v13-0001-Add-planning-counters-to-pg_stat_statements.patch

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

31 March 2020, 18:42:17

On Wed, Apr 01, 2020 at 02:43:10AM +0900, Fujii Masao wrote:
>
>
> On 2020/03/31 16:33, Julien Rouhaud wrote:
> >
> > v12 attached!
>
> Thanks for updating the patch! The patch looks good to me.
>
> I applied minor and cosmetic changes into the patch. Attached is
> the updated version of the patch. Barring any objection, I'd like to
> commit this version.
>
> BTW, the minor and cosmetic changes that I applied are, for example,
>
> - Rename pgss_planner_hook to pgss_planner for the sake of consistency.
>    Other function using hook in pgss doesn't use "_hook" in their names, too.
> - Make pgss_planner use PG_FINALLY() instead of PG_CATCH().
> - Make PGSS_NUMKIND as the last value in enum pgssStoreKind.


+1, and the PGSS_INVALID is also way better.


> - Update the sample output in the document.
> etc


Thanks a lot.  It all looks good to me!

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Fujii Masao

Date:

01 April 2020, 09:19:21


On 2020/04/01 3:42, Julien Rouhaud wrote:
> On Wed, Apr 01, 2020 at 02:43:10AM +0900, Fujii Masao wrote:
>>
>>
>> On 2020/03/31 16:33, Julien Rouhaud wrote:
>>>
>>> v12 attached!
>>
>> Thanks for updating the patch! The patch looks good to me.
>>
>> I applied minor and cosmetic changes into the patch. Attached is
>> the updated version of the patch. Barring any objection, I'd like to
>> commit this version.
>>
>> BTW, the minor and cosmetic changes that I applied are, for example,
>>
>> - Rename pgss_planner_hook to pgss_planner for the sake of consistency.
>>     Other function using hook in pgss doesn't use "_hook" in their names, too.
>> - Make pgss_planner use PG_FINALLY() instead of PG_CATCH().
>> - Make PGSS_NUMKIND as the last value in enum pgssStoreKind.
> 
> 
> +1, and the PGSS_INVALID is also way better.
> 
> 
>> - Update the sample output in the document.
>> etc
> 
> 
> Thanks a lot.  It all looks good to me!

Thanks for the check!

I tried to pick up the names of authors and reviewers of this patch,
from the past discussions. Then I'm thinking to write the followings
in the commit log. Are there any other developers that should be
credited as author or reviewer?

Author: Julien Rouhaud, Pascal Legrand, Thomas Munro, Fujii Masao
Reviewed-by: Sergei Kornilov, Tomas Vondra, Yoshikazu Imai, Haribabu Kommi, Tom Lane

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Fujii Masao

Date:

02 April 2020, 02:32:50


On 2020/04/01 18:19, Fujii Masao wrote:
> 
> 
> On 2020/04/01 3:42, Julien Rouhaud wrote:
>> On Wed, Apr 01, 2020 at 02:43:10AM +0900, Fujii Masao wrote:
>>>
>>>
>>> On 2020/03/31 16:33, Julien Rouhaud wrote:
>>>>
>>>> v12 attached!
>>>
>>> Thanks for updating the patch! The patch looks good to me.
>>>
>>> I applied minor and cosmetic changes into the patch. Attached is
>>> the updated version of the patch. Barring any objection, I'd like to
>>> commit this version.
>>>
>>> BTW, the minor and cosmetic changes that I applied are, for example,
>>>
>>> - Rename pgss_planner_hook to pgss_planner for the sake of consistency.
>>>     Other function using hook in pgss doesn't use "_hook" in their names, too.
>>> - Make pgss_planner use PG_FINALLY() instead of PG_CATCH().
>>> - Make PGSS_NUMKIND as the last value in enum pgssStoreKind.
>>
>>
>> +1, and the PGSS_INVALID is also way better.
>>
>>
>>> - Update the sample output in the document.
>>> etc
>>
>>
>> Thanks a lot.  It all looks good to me!

Finally I pushed the patch!
Many thanks for all involved in this patch!

As a remaining TODO item, I'm thinking that the document would need to
be improved. For example, previously the query was not stored in pgss
when it failed. But, in v13, if pgss_planning is enabled, such a query is
stored because the planning succeeds. Without the explanation about
that behavior in the document, I'm afraid that users will get confused.
Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Planning counters in pg_stat_statements (using pgss_store)

From

legrand legrand

Date:

02 April 2020, 20:04:28

Fujii Masao-4 wrote
> On 2020/04/01 18:19, Fujii Masao wrote:
> 
> Finally I pushed the patch!
> Many thanks for all involved in this patch!
> 
> As a remaining TODO item, I'm thinking that the document would need to
> be improved. For example, previously the query was not stored in pgss
> when it failed. But, in v13, if pgss_planning is enabled, such a query is
> stored because the planning succeeds. Without the explanation about
> that behavior in the document, I'm afraid that users will get confused.
> Thought?
> 
> Regards,
> 
> -- 
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION

Thank you all for this work and especially to Julian for its major
contribution !

Regarding the TODO point: Yes I agree that it can be improved.
My proposal:

"Note that planning and execution statistics are updated only at their 
respective end phase, and only for successfull operations.
For exemple executions counters of a long running SELECT query, 
will be updated at the execution end, without showing any progress 
report in the interval.
Other exemple, if the statement is successfully planned but fails in 
the execution phase, only its planning statistics are stored.
This may give uncorrelated plans vs calls informations."

Regards
PAscal

--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

03 April 2020, 07:26:28

On Thu, Apr 02, 2020 at 01:04:28PM -0700, legrand legrand wrote:
> Fujii Masao-4 wrote
> > On 2020/04/01 18:19, Fujii Masao wrote:
> > 
> > Finally I pushed the patch!
> > Many thanks for all involved in this patch!
> > 
> > As a remaining TODO item, I'm thinking that the document would need to
> > be improved. For example, previously the query was not stored in pgss
> > when it failed. But, in v13, if pgss_planning is enabled, such a query is
> > stored because the planning succeeds. Without the explanation about
> > that behavior in the document, I'm afraid that users will get confused.
> > Thought?
> 
> Thank you all for this work and especially to Julian for its major
> contribution !

Thanks a lot to everyone!  This was quite a long journey.

> Regarding the TODO point: Yes I agree that it can be improved.
> My proposal:
> 
> "Note that planning and execution statistics are updated only at their 
> respective end phase, and only for successfull operations.
> For exemple executions counters of a long running SELECT query, 
> will be updated at the execution end, without showing any progress 
> report in the interval.
> Other exemple, if the statement is successfully planned but fails in 
> the execution phase, only its planning statistics are stored.
> This may give uncorrelated plans vs calls informations."

There are numerous reasons for lack of correlation between number of planning
and number of execution, so I'm afraid that this will give users the false
impression that only failed execution can lead to that.

Here's some enhancement on your proposal:

"Note that planning and execution statistics are updated only at their
respective end phase, and only for successful operations.
For example the execution counters of a long running query
will only be updated at the execution end, without showing any progress
report before that.
Similarly, if a statement is successfully planned but fails during
the execution phase, only its planning statistics will be displayed.
Please also note that the number of planning and number of execution aren't
expected to match, as the planification of a query won't always be followed by
its execution and reciprocally."

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Fujii Masao

Date:

08 April 2020, 08:37:27


On 2020/04/03 16:26, Julien Rouhaud wrote:
> On Thu, Apr 02, 2020 at 01:04:28PM -0700, legrand legrand wrote:
>> Fujii Masao-4 wrote
>>> On 2020/04/01 18:19, Fujii Masao wrote:
>>>
>>> Finally I pushed the patch!
>>> Many thanks for all involved in this patch!
>>>
>>> As a remaining TODO item, I'm thinking that the document would need to
>>> be improved. For example, previously the query was not stored in pgss
>>> when it failed. But, in v13, if pgss_planning is enabled, such a query is
>>> stored because the planning succeeds. Without the explanation about
>>> that behavior in the document, I'm afraid that users will get confused.
>>> Thought?
>>
>> Thank you all for this work and especially to Julian for its major
>> contribution !
> 
> 
> Thanks a lot to everyone!  This was quite a long journey.
> 
> 
>> Regarding the TODO point: Yes I agree that it can be improved.
>> My proposal:
>>
>> "Note that planning and execution statistics are updated only at their
>> respective end phase, and only for successfull operations.
>> For exemple executions counters of a long running SELECT query,
>> will be updated at the execution end, without showing any progress
>> report in the interval.
>> Other exemple, if the statement is successfully planned but fails in
>> the execution phase, only its planning statistics are stored.
>> This may give uncorrelated plans vs calls informations."

Thanks for the proposal!

> There are numerous reasons for lack of correlation between number of planning
> and number of execution, so I'm afraid that this will give users the false
> impression that only failed execution can lead to that.
> 
> Here's some enhancement on your proposal:
> 
> "Note that planning and execution statistics are updated only at their
> respective end phase, and only for successful operations.
> For example the execution counters of a long running query
> will only be updated at the execution end, without showing any progress
> report before that.

Probably since this is not the example for explaining the relationship of
planning and execution stats, it's better to explain this separately or just
drop it?

What about the attached patch based on your proposals?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

improve_pgss_docs_v1.patch

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

08 April 2020, 09:31:20

On Wed, Apr 08, 2020 at 05:37:27PM +0900, Fujii Masao wrote:
> 
> 
> On 2020/04/03 16:26, Julien Rouhaud wrote:
> > On Thu, Apr 02, 2020 at 01:04:28PM -0700, legrand legrand wrote:
> > > Fujii Masao-4 wrote
> > > > On 2020/04/01 18:19, Fujii Masao wrote:
> > > > 
> > > > Finally I pushed the patch!
> > > > Many thanks for all involved in this patch!
> > > > 
> > > > As a remaining TODO item, I'm thinking that the document would need to
> > > > be improved. For example, previously the query was not stored in pgss
> > > > when it failed. But, in v13, if pgss_planning is enabled, such a query is
> > > > stored because the planning succeeds. Without the explanation about
> > > > that behavior in the document, I'm afraid that users will get confused.
> > > > Thought?
> > > 
> > > Thank you all for this work and especially to Julian for its major
> > > contribution !
> > 
> > 
> > Thanks a lot to everyone!  This was quite a long journey.
> > 
> > 
> > > Regarding the TODO point: Yes I agree that it can be improved.
> > > My proposal:
> > > 
> > > "Note that planning and execution statistics are updated only at their
> > > respective end phase, and only for successfull operations.
> > > For exemple executions counters of a long running SELECT query,
> > > will be updated at the execution end, without showing any progress
> > > report in the interval.
> > > Other exemple, if the statement is successfully planned but fails in
> > > the execution phase, only its planning statistics are stored.
> > > This may give uncorrelated plans vs calls informations."
> 
> Thanks for the proposal!
> 
> > There are numerous reasons for lack of correlation between number of planning
> > and number of execution, so I'm afraid that this will give users the false
> > impression that only failed execution can lead to that.
> > 
> > Here's some enhancement on your proposal:
> > 
> > "Note that planning and execution statistics are updated only at their
> > respective end phase, and only for successful operations.
> > For example the execution counters of a long running query
> > will only be updated at the execution end, without showing any progress
> > report before that.
> 
> Probably since this is not the example for explaining the relationship of
> planning and execution stats, it's better to explain this separately or just
> drop it?
> 
> What about the attached patch based on your proposals?
> 

Thanks Fuji-san, it looks perfect to me!

Re: Planning counters in pg_stat_statements (using pgss_store)

From

legrand legrand

Date:

08 April 2020, 12:32:54

Fujii Masao-4 wrote
> On 2020/04/03 16:26
> [...]
>> 
>> "Note that planning and execution statistics are updated only at their
>> respective end phase, and only for successful operations.
>> For example the execution counters of a long running query
>> will only be updated at the execution end, without showing any progress
>> report before that.
> 
> Probably since this is not the example for explaining the relationship of
> planning and execution stats, it's better to explain this separately or
> just
> drop it?
> 
> What about the attached patch based on your proposals?

+1
Your patch is perfect ;^>

Regards
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Fujii Masao

Date:

09 April 2020, 03:59:43


On 2020/04/08 18:31, Julien Rouhaud wrote:
> On Wed, Apr 08, 2020 at 05:37:27PM +0900, Fujii Masao wrote:
>>
>>
>> On 2020/04/03 16:26, Julien Rouhaud wrote:
>>> On Thu, Apr 02, 2020 at 01:04:28PM -0700, legrand legrand wrote:
>>>> Fujii Masao-4 wrote
>>>>> On 2020/04/01 18:19, Fujii Masao wrote:
>>>>>
>>>>> Finally I pushed the patch!
>>>>> Many thanks for all involved in this patch!
>>>>>
>>>>> As a remaining TODO item, I'm thinking that the document would need to
>>>>> be improved. For example, previously the query was not stored in pgss
>>>>> when it failed. But, in v13, if pgss_planning is enabled, such a query is
>>>>> stored because the planning succeeds. Without the explanation about
>>>>> that behavior in the document, I'm afraid that users will get confused.
>>>>> Thought?
>>>>
>>>> Thank you all for this work and especially to Julian for its major
>>>> contribution !
>>>
>>>
>>> Thanks a lot to everyone!  This was quite a long journey.
>>>
>>>
>>>> Regarding the TODO point: Yes I agree that it can be improved.
>>>> My proposal:
>>>>
>>>> "Note that planning and execution statistics are updated only at their
>>>> respective end phase, and only for successfull operations.
>>>> For exemple executions counters of a long running SELECT query,
>>>> will be updated at the execution end, without showing any progress
>>>> report in the interval.
>>>> Other exemple, if the statement is successfully planned but fails in
>>>> the execution phase, only its planning statistics are stored.
>>>> This may give uncorrelated plans vs calls informations."
>>
>> Thanks for the proposal!
>>
>>> There are numerous reasons for lack of correlation between number of planning
>>> and number of execution, so I'm afraid that this will give users the false
>>> impression that only failed execution can lead to that.
>>>
>>> Here's some enhancement on your proposal:
>>>
>>> "Note that planning and execution statistics are updated only at their
>>> respective end phase, and only for successful operations.
>>> For example the execution counters of a long running query
>>> will only be updated at the execution end, without showing any progress
>>> report before that.
>>
>> Probably since this is not the example for explaining the relationship of
>> planning and execution stats, it's better to explain this separately or just
>> drop it?
>>
>> What about the attached patch based on your proposals?
>>
> 
> Thanks Fuji-san, it looks perfect to me!

Thanks for the check! Pushed!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Fujii Masao

Date:

09 April 2020, 04:00:03


On 2020/04/08 21:32, legrand legrand wrote:
> Fujii Masao-4 wrote
>> On 2020/04/03 16:26
>> [...]
>>>
>>> "Note that planning and execution statistics are updated only at their
>>> respective end phase, and only for successful operations.
>>> For example the execution counters of a long running query
>>> will only be updated at the execution end, without showing any progress
>>> report before that.
>>
>> Probably since this is not the example for explaining the relationship of
>> planning and execution stats, it's better to explain this separately or
>> just
>> drop it?
>>
>> What about the attached patch based on your proposals?
> 
> +1
> Your patch is perfect ;^>

Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

09 April 2020, 13:31:31

On Thu, Apr 9, 2020 at 5:59 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/04/08 18:31, Julien Rouhaud wrote:
> > On Wed, Apr 08, 2020 at 05:37:27PM +0900, Fujii Masao wrote:
> >>
> >>
> >> On 2020/04/03 16:26, Julien Rouhaud wrote:
> >>> On Thu, Apr 02, 2020 at 01:04:28PM -0700, legrand legrand wrote:
> >>>> Fujii Masao-4 wrote
> >>>>> On 2020/04/01 18:19, Fujii Masao wrote:
> >>>>>
> >>>>> Finally I pushed the patch!
> >>>>> Many thanks for all involved in this patch!
> >>>>>
> >>>>> As a remaining TODO item, I'm thinking that the document would need to
> >>>>> be improved. For example, previously the query was not stored in pgss
> >>>>> when it failed. But, in v13, if pgss_planning is enabled, such a query is
> >>>>> stored because the planning succeeds. Without the explanation about
> >>>>> that behavior in the document, I'm afraid that users will get confused.
> >>>>> Thought?
> >>>>
> >>>> Thank you all for this work and especially to Julian for its major
> >>>> contribution !
> >>>
> >>>
> >>> Thanks a lot to everyone!  This was quite a long journey.
> >>>
> >>>
> >>>> Regarding the TODO point: Yes I agree that it can be improved.
> >>>> My proposal:
> >>>>
> >>>> "Note that planning and execution statistics are updated only at their
> >>>> respective end phase, and only for successfull operations.
> >>>> For exemple executions counters of a long running SELECT query,
> >>>> will be updated at the execution end, without showing any progress
> >>>> report in the interval.
> >>>> Other exemple, if the statement is successfully planned but fails in
> >>>> the execution phase, only its planning statistics are stored.
> >>>> This may give uncorrelated plans vs calls informations."
> >>
> >> Thanks for the proposal!
> >>
> >>> There are numerous reasons for lack of correlation between number of planning
> >>> and number of execution, so I'm afraid that this will give users the false
> >>> impression that only failed execution can lead to that.
> >>>
> >>> Here's some enhancement on your proposal:
> >>>
> >>> "Note that planning and execution statistics are updated only at their
> >>> respective end phase, and only for successful operations.
> >>> For example the execution counters of a long running query
> >>> will only be updated at the execution end, without showing any progress
> >>> report before that.
> >>
> >> Probably since this is not the example for explaining the relationship of
> >> planning and execution stats, it's better to explain this separately or just
> >> drop it?
> >>
> >> What about the attached patch based on your proposals?
> >>
> >
> > Thanks Fuji-san, it looks perfect to me!
>
> Thanks for the check! Pushed!

Thanks a lot Fuji-san!

For the record, the commit is available, but I didn't receive the
usual mail, and it's also not present in the archives apparently:
https://www.postgresql.org/list/pgsql-committers/since/202004090000/
(although Amit's latest commit was delivered as expected).

Given your previous discussion with Magnus, I'm assuming that your
address is now allowed to post for a year. I'm not sure what went
wrong here, so I'm adding Magnus in Cc.

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Fujii Masao

Date:

09 April 2020, 14:02:21


On 2020/04/09 22:31, Julien Rouhaud wrote:
> On Thu, Apr 9, 2020 at 5:59 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2020/04/08 18:31, Julien Rouhaud wrote:
>>> On Wed, Apr 08, 2020 at 05:37:27PM +0900, Fujii Masao wrote:
>>>>
>>>>
>>>> On 2020/04/03 16:26, Julien Rouhaud wrote:
>>>>> On Thu, Apr 02, 2020 at 01:04:28PM -0700, legrand legrand wrote:
>>>>>> Fujii Masao-4 wrote
>>>>>>> On 2020/04/01 18:19, Fujii Masao wrote:
>>>>>>>
>>>>>>> Finally I pushed the patch!
>>>>>>> Many thanks for all involved in this patch!
>>>>>>>
>>>>>>> As a remaining TODO item, I'm thinking that the document would need to
>>>>>>> be improved. For example, previously the query was not stored in pgss
>>>>>>> when it failed. But, in v13, if pgss_planning is enabled, such a query is
>>>>>>> stored because the planning succeeds. Without the explanation about
>>>>>>> that behavior in the document, I'm afraid that users will get confused.
>>>>>>> Thought?
>>>>>>
>>>>>> Thank you all for this work and especially to Julian for its major
>>>>>> contribution !
>>>>>
>>>>>
>>>>> Thanks a lot to everyone!  This was quite a long journey.
>>>>>
>>>>>
>>>>>> Regarding the TODO point: Yes I agree that it can be improved.
>>>>>> My proposal:
>>>>>>
>>>>>> "Note that planning and execution statistics are updated only at their
>>>>>> respective end phase, and only for successfull operations.
>>>>>> For exemple executions counters of a long running SELECT query,
>>>>>> will be updated at the execution end, without showing any progress
>>>>>> report in the interval.
>>>>>> Other exemple, if the statement is successfully planned but fails in
>>>>>> the execution phase, only its planning statistics are stored.
>>>>>> This may give uncorrelated plans vs calls informations."
>>>>
>>>> Thanks for the proposal!
>>>>
>>>>> There are numerous reasons for lack of correlation between number of planning
>>>>> and number of execution, so I'm afraid that this will give users the false
>>>>> impression that only failed execution can lead to that.
>>>>>
>>>>> Here's some enhancement on your proposal:
>>>>>
>>>>> "Note that planning and execution statistics are updated only at their
>>>>> respective end phase, and only for successful operations.
>>>>> For example the execution counters of a long running query
>>>>> will only be updated at the execution end, without showing any progress
>>>>> report before that.
>>>>
>>>> Probably since this is not the example for explaining the relationship of
>>>> planning and execution stats, it's better to explain this separately or just
>>>> drop it?
>>>>
>>>> What about the attached patch based on your proposals?
>>>>
>>>
>>> Thanks Fuji-san, it looks perfect to me!
>>
>> Thanks for the check! Pushed!
> 
> Thanks a lot Fuji-san!
> 
> For the record, the commit is available, but I didn't receive the
> usual mail, and it's also not present in the archives apparently:
> https://www.postgresql.org/list/pgsql-committers/since/202004090000/
> (although Amit's latest commit was delivered as expected).

Yes.

> Given your previous discussion with Magnus, I'm assuming that your
> address is now allowed to post for a year. I'm not sure what went
> wrong here, so I'm adding Magnus in Cc.

Thanks! I also reported the issue in pgsql-www.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Andy Fan

Date:

19 May 2020, 02:28:37

Thanks for the excellent extension. I want to add 5 more fields to satisfy the
following requirements.

int subplan; /* No. of subplan in this query */
int subquery; /* No. of subquery */
int joincnt; /* How many relations are joined */
bool hasagg; /* if we have agg function in this query */
bool hasgroup; /* has group clause */

1. Usually I want to check total_exec_time / rows to see if the query is missing
index, however aggregation/groupby case makes this rule doesn't work. so
hasagg/hasgroup should be a good rule to filter out these queries.

2. subplan is also a important clue to find out the query to turning. when we
check the slow queries with pg_stat_statements, such information maybe
helpful as well.

3. As for subquery / joincnt, actually it is just helpful for optimizer
developer to understand the query character is running most, it doesn't help
much for user.

The attached is a PoC, that is far from perfect since 1). It maintain a
per-backend global variable query_character which is only used in
pg_stat_statements extension. 2). The 5 fields is impossible to change no
matter how many times it runs, so it can't be treat as Counter in nature.
However I don't think the above 2 will cause big issues.

I added the columns to V1_8 rather than adding a new version. this can be
changed at final patch.

Any suggestions?

Best Regards

Andy Fan

Attachment

v1-0001-Add-query-characters-information-to-pg_stat_state.patch

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

21 May 2020, 06:49:53

On Tue, May 19, 2020 at 4:29 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Thanks for the excellent extension. I want to add 5 more fields to satisfy the
> following requirements.
>
> int   subplan; /* No. of subplan in this query */
> int   subquery; /* No. of subquery */
> int   joincnt; /* How many relations are joined */
> bool hasagg; /* if we have agg function in this query */
> bool hasgroup; /* has group clause */
>
>
> 1. Usually I want to check total_exec_time / rows to see if the query is missing
>    index, however aggregation/groupby case makes this rule doesn't work. so
>    hasagg/hasgroup should be a good rule to filter out these queries.
>
> 2. subplan is also a important clue to find out the query to turning. when we
>    check the slow queries with pg_stat_statements, such information maybe
>    helpful as well.
>
> 3. As for subquery / joincnt,  actually it is just helpful for optimizer
>    developer to understand the query character is running most, it doesn't help
>    much for user.
>
>
> The attached is a PoC, that is far from perfect since 1). It maintain a
> per-backend global variable query_character which is only used in
> pg_stat_statements extension.  2). The 5 fields is impossible to change no
> matter how many times it runs, so it can't be treat as Counter in nature.
> However I don't think the above 2 will cause big issues.
>
> I added the columns to V1_8 rather than adding a new version. this can be
> changed at final patch.
>
> Any suggestions?

Most of those fields can be computed using the raw sql satements.  Why
not adding functions like query_has_agg(querytext) to get the
information from pgss stored query text instead?

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Michael Paquier

Date:

21 May 2020, 07:17:16

On Thu, May 21, 2020 at 08:49:53AM +0200, Julien Rouhaud wrote:
> On Tue, May 19, 2020 at 4:29 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>> Thanks for the excellent extension. I want to add 5 more fields to satisfy the
>> following requirements.
>>
>> int   subplan; /* No. of subplan in this query */
>> int   subquery; /* No. of subquery */
>> int   joincnt; /* How many relations are joined */
>> bool hasagg; /* if we have agg function in this query */
>> bool hasgroup; /* has group clause */
>
> Most of those fields can be computed using the raw sql satements.  Why
> not adding functions like query_has_agg(querytext) to get the
> information from pgss stored query text instead?

Yeah I personally find concepts related only to the query string
itself not something that needs to be tied to pg_stat_statements.
While reading about those five new fields, I am also wondering how
this stuff would work with CTEs.  Particularly, should the hasagg or
hasgroup flags be set only if the most outer query satisfies a
condition?  What if an inner query satisfies a condition but not an
outer query?  Should joincnt just be the sum of all the joins done in
all queries, including subqueries?
--
Michael

Attachment

signature.asc

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

21 May 2020, 07:49:19

Le jeu. 21 mai 2020 à 09:17, Michael Paquier <michael@paquier.xyz> a écrit :

On Thu, May 21, 2020 at 08:49:53AM +0200, Julien Rouhaud wrote:
> On Tue, May 19, 2020 at 4:29 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>> Thanks for the excellent extension. I want to add 5 more fields to satisfy the
>> following requirements.
>>
>> int subplan; /* No. of subplan in this query */
>> int subquery; /* No. of subquery */
>> int joincnt; /* How many relations are joined */
>> bool hasagg; /* if we have agg function in this query */
>> bool hasgroup; /* has group clause */
>
> Most of those fields can be computed using the raw sql satements. Why
> not adding functions like query_has_agg(querytext) to get the
> information from pgss stored query text instead?

Yeah I personally find concepts related only to the query string
itself not something that needs to be tied to pg_stat_statements.
While reading about those five new fields, I am also wondering how
this stuff would work with CTEs. Particularly, should the hasagg or
hasgroup flags be set only if the most outer query satisfies a
condition? What if an inner query satisfies a condition but not an
outer query? Should joincnt just be the sum of all the joins done in
all queries, including subqueries?

Indeed cte will bring additional concerns about the fields semantics. That's another good reason to go with external functions so you can add extra parameters for that if needed.

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Andy Fan

Date:

22 May 2020, 06:02:52

On Thu, May 21, 2020 at 3:17 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, May 21, 2020 at 08:49:53AM +0200, Julien Rouhaud wrote:
> On Tue, May 19, 2020 at 4:29 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>> Thanks for the excellent extension. I want to add 5 more fields to satisfy the
>> following requirements.
>>
>> int subplan; /* No. of subplan in this query */
>> int subquery; /* No. of subquery */
>> int joincnt; /* How many relations are joined */
>> bool hasagg; /* if we have agg function in this query */
>> bool hasgroup; /* has group clause */
>
> Most of those fields can be computed using the raw sql satements. Why
> not adding functions like query_has_agg(querytext) to get the
> information from pgss stored query text instead?

Yeah I personally find concepts related only to the query string
itself not something that needs to be tied to pg_stat_statements.
While reading about those five new fields, I am also wondering how
this stuff would work with CTEs. Particularly, should the hasagg or
hasgroup flags be set only if the most outer query satisfies a
condition? What if an inner query satisfies a condition but not an

outer query? Should joincnt just be the sum of all the joins done in
all queries, including subqueries?

The semantics is for overall query not for most outer query. see codes

like this for example:

query_characters.hasagg |= parse->hasAggs;

query_characters.hasgroup |= parse->groupClause != NIL;

> Most of those fields can be computed using the raw sql satements. Why
> not adding functions like query_has_agg(querytext) to get the
> information from pgss stored query text instead?

That mainly because I don't want to reparse the query again.

Best Regards

Andy Fan

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Andy Fan

Date:

22 May 2020, 06:10:29

On Thu, May 21, 2020 at 3:49 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Le jeu. 21 mai 2020 à 09:17, Michael Paquier <michael@paquier.xyz> a écrit :
On Thu, May 21, 2020 at 08:49:53AM +0200, Julien Rouhaud wrote:
> On Tue, May 19, 2020 at 4:29 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>> Thanks for the excellent extension. I want to add 5 more fields to satisfy the
>> following requirements.
>>
>> int subplan; /* No. of subplan in this query */
>> int subquery; /* No. of subquery */
>> int joincnt; /* How many relations are joined */
>> bool hasagg; /* if we have agg function in this query */
>> bool hasgroup; /* has group clause */
>
> Most of those fields can be computed using the raw sql satements. Why
> not adding functions like query_has_agg(querytext) to get the
> information from pgss stored query text instead?

Yeah I personally find concepts related only to the query string
itself not something that needs to be tied to pg_stat_statements.
...

Indeed cte will bring additional concerns about the fields semantics. That's another good reason to go with external functions so you can add extra parameters for that if needed.

There are something more we can't get from query string easily. like:

1. view involved. 2. subquery are pulled up so there is not subquery

indeed. 3. sublink are pull-up or become as an InitPlan rather than subPlan.

4. joins are removed by remove_useless_joins.

Best Regards

Andy Fan

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Fujii Masao

Date:

22 May 2020, 13:51:08


On 2020/05/22 15:10, Andy Fan wrote:
> 
> 
> On Thu, May 21, 2020 at 3:49 PM Julien Rouhaud <rjuju123@gmail.com <mailto:rjuju123@gmail.com>> wrote:
> 
>     Le jeu. 21 mai 2020 à 09:17, Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> a écrit :
> 
>         On Thu, May 21, 2020 at 08:49:53AM +0200, Julien Rouhaud wrote:
>          > On Tue, May 19, 2020 at 4:29 AM Andy Fan <zhihui.fan1213@gmail.com <mailto:zhihui.fan1213@gmail.com>>
wrote:
>          >> Thanks for the excellent extension. I want to add 5 more fields to satisfy the
>          >> following requirements.
>          >>
>          >> int   subplan; /* No. of subplan in this query */
>          >> int   subquery; /* No. of subquery */
>          >> int   joincnt; /* How many relations are joined */
>          >> bool hasagg; /* if we have agg function in this query */
>          >> bool hasgroup; /* has group clause */
>          >
>          > Most of those fields can be computed using the raw sql satements.  Why
>          > not adding functions like query_has_agg(querytext) to get the
>          > information from pgss stored query text instead?
> 
>         Yeah I personally find concepts related only to the query string
>         itself not something that needs to be tied to pg_stat_statements.
>         ...
> 
> 
>     Indeed cte will bring additional concerns about the fields semantics. That's another good reason to go with
externalfunctions so you can add extra parameters for that if needed.
 
> 
> 
> There are something more we can't get from query string easily. like:
> 1. view involved.   2. subquery are pulled up so there is not subquery
> indeed.  3. sublink are pull-up or become as an InitPlan rather than subPlan.
> 4. joins are removed by remove_useless_joins.

If we can store the plan for each statement, e.g., like pg_store_plans
extension [1] does, rather than such partial information, which would
be enough for your cases?

Regards,

[1]
http://pgstoreplans.osdn.jp/pg_store_plans.html
https://github.com/ossc-db/pg_store_plans

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

22 May 2020, 16:48:19

On Fri, May 22, 2020 at 3:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> On 2020/05/22 15:10, Andy Fan wrote:
> >
> >
> > On Thu, May 21, 2020 at 3:49 PM Julien Rouhaud <rjuju123@gmail.com <mailto:rjuju123@gmail.com>> wrote:
> >
> >     Le jeu. 21 mai 2020 à 09:17, Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> a écrit :
> >
> >         On Thu, May 21, 2020 at 08:49:53AM +0200, Julien Rouhaud wrote:
> >          > On Tue, May 19, 2020 at 4:29 AM Andy Fan <zhihui.fan1213@gmail.com <mailto:zhihui.fan1213@gmail.com>>
wrote:
> >          >> Thanks for the excellent extension. I want to add 5 more fields to satisfy the
> >          >> following requirements.
> >          >>
> >          >> int   subplan; /* No. of subplan in this query */
> >          >> int   subquery; /* No. of subquery */
> >          >> int   joincnt; /* How many relations are joined */
> >          >> bool hasagg; /* if we have agg function in this query */
> >          >> bool hasgroup; /* has group clause */
> >          >
> >          > Most of those fields can be computed using the raw sql satements.  Why
> >          > not adding functions like query_has_agg(querytext) to get the
> >          > information from pgss stored query text instead?
> >
> >         Yeah I personally find concepts related only to the query string
> >         itself not something that needs to be tied to pg_stat_statements.
> >         ...
> >
> >
> >     Indeed cte will bring additional concerns about the fields semantics. That's another good reason to go with
externalfunctions so you can add extra parameters for that if needed. 
> >
> >
> > There are something more we can't get from query string easily. like:
> > 1. view involved.   2. subquery are pulled up so there is not subquery
> > indeed.  3. sublink are pull-up or become as an InitPlan rather than subPlan.
> > 4. joins are removed by remove_useless_joins.
>
> If we can store the plan for each statement, e.g., like pg_store_plans
> extension [1] does, rather than such partial information, which would
> be enough for your cases?

That'd definitely address way more use cases.  Do you know if some
benchmark were done to see how much overhead such an extension adds?

Re: Planning counters in pg_stat_statements (using pgss_store)

From

legrand legrand

Date:

22 May 2020, 19:27:31

>> If we can store the plan for each statement, e.g., like pg_store_plans
>> extension [1] does, rather than such partial information, which would
>> be enough for your cases?

> That'd definitely address way more use cases.  Do you know if some
> benchmark were done to see how much overhead such an extension adds?

Hi Julien,
Did you asked about how overhead Auto Explain adds ?

The only extension that was proposing to store plans with a decent planid 
calculation was pg_stat_plans that is not compatible any more with recent 
pg versions for years.

We all know here that pg_store_plans, pg_show_plans, (my) pg_stat_sql_plans
use ExplainPrintPlan through Executor Hook, and that Explain is slow ...

Explain is slow because it was not designed for performances:
1/ colname_is_unique
see
https://www.postgresql-archive.org/Re-Explain-is-slow-with-tables-having-many-columns-td6047284.html

2/ hash_create from set_rtable_names
Look with perf top about
   do $$ declare i int; begin for i in 1..1000000 loop execute 'explain
select 1'; end loop end; $$;

I may propose a "minimal" explain that only display explain's backbone and
is much faster
see
https://github.com/legrandlegrand/pg_stat_sql_plans/blob/perf-explain/pgssp_explain.c
 
3/ All those extensions rebuild the explain output even with cached plan
queries ...
 a way to optimize this would be to build a planid during planning (using
associated hook)

4/ All thoses extensions try to rebuild the explain plan even for trivial
queries/plans 
like "select 1" or " insert into t values (,,,)" and that's not great for
high transactional 
applications ...

So yes, pg_store_plans is one of the short term answers to Andy Fan needs, 
the answer for the long term would be to help extensions to build planid and
store plans, 
by **adding a planid field in plannedstmt memory structure ** and/or 
optimizing explain command;o)

Regards
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

23 May 2020, 06:33:32

On Fri, May 22, 2020 at 9:27 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:
>
> >> If we can store the plan for each statement, e.g., like pg_store_plans
> >> extension [1] does, rather than such partial information, which would
> >> be enough for your cases?
>
> > That'd definitely address way more use cases.  Do you know if some
> > benchmark were done to see how much overhead such an extension adds?
>
> Hi Julien,
> Did you asked about how overhead Auto Explain adds ?

Well, yes but on the other hand auto_explain is by design definitely
not something intended to trace all queries in an OLTP environment,
but rather configured to catch only some long running queries, so in
such cases the overhead is quite negligible.

> The only extension that was proposing to store plans with a decent planid
> calculation was pg_stat_plans that is not compatible any more with recent
> pg versions for years.

Ah I see.  AFAICT it's mainly missing the new node changes, but the
approach should otherwise still work smoothly.

Did you do some benchmark to compare this extension with the other
alternatives? Assuming that there's postgres version compatible with
all the extensions of course.

> We all know here that pg_store_plans, pg_show_plans, (my) pg_stat_sql_plans
> use ExplainPrintPlan through Executor Hook, and that Explain is slow ...
>
> Explain is slow because it was not designed for performances:
> 1/ colname_is_unique
> see
> https://www.postgresql-archive.org/Re-Explain-is-slow-with-tables-having-many-columns-td6047284.html
>
> 2/ hash_create from set_rtable_names
> Look with perf top about
>    do $$ declare i int; begin for i in 1..1000000 loop execute 'explain
> select 1'; end loop end; $$;
>
> I may propose a "minimal" explain that only display explain's backbone and
> is much faster
> see
> https://github.com/legrandlegrand/pg_stat_sql_plans/blob/perf-explain/pgssp_explain.c
>
> 3/ All those extensions rebuild the explain output even with cached plan
> queries ...
>  a way to optimize this would be to build a planid during planning (using
> associated hook)
>
> 4/ All thoses extensions try to rebuild the explain plan even for trivial
> queries/plans
> like "select 1" or " insert into t values (,,,)" and that's not great for
> high transactional
> applications ...
>
> So yes, pg_store_plans is one of the short term answers to Andy Fan needs,
> the answer for the long term would be to help extensions to build planid and
> store plans,
> by **adding a planid field in plannedstmt memory structure ** and/or
> optimizing explain command;o)

I'd be in favor of adding a planid and using the same approach as
pg_store_plans.

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Andy Fan

Date:

26 May 2020, 11:49:06

On Fri, May 22, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/05/22 15:10, Andy Fan wrote:
>
>
> On Thu, May 21, 2020 at 3:49 PM Julien Rouhaud <rjuju123@gmail.com <mailto:rjuju123@gmail.com>> wrote:
>
> Le jeu. 21 mai 2020 à 09:17, Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> a écrit :
>
> On Thu, May 21, 2020 at 08:49:53AM +0200, Julien Rouhaud wrote:
> > On Tue, May 19, 2020 at 4:29 AM Andy Fan <zhihui.fan1213@gmail.com <mailto:zhihui.fan1213@gmail.com>> wrote:
> >> Thanks for the excellent extension. I want to add 5 more fields to satisfy the
> >> following requirements.
> >>
> >> int subplan; /* No. of subplan in this query */
> >> int subquery; /* No. of subquery */
> >> int joincnt; /* How many relations are joined */
> >> bool hasagg; /* if we have agg function in this query */
> >> bool hasgroup; /* has group clause */
> >
> > Most of those fields can be computed using the raw sql satements. Why
> > not adding functions like query_has_agg(querytext) to get the
> > information from pgss stored query text instead?
>
> Yeah I personally find concepts related only to the query string
> itself not something that needs to be tied to pg_stat_statements.
> ...
>
>
> Indeed cte will bring additional concerns about the fields semantics. That's another good reason to go with external functions so you can add extra parameters for that if needed.
>
>
> There are something more we can't get from query string easily. like:
> 1. view involved. 2. subquery are pulled up so there is not subquery
> indeed. 3. sublink are pull-up or become as an InitPlan rather than subPlan.
> 4. joins are removed by remove_useless_joins.

If we can store the plan for each statement, e.g., like pg_store_plans
extension [1] does, rather than such partial information, which would
be enough for your cases?

That would be helpful if I can search the interested data from it. Oracle has

v$sql_plan, where every node in the plan has its own record, so it is easy

to search.

Best Regards

Andy Fan

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Tom Lane

Date:

25 July 2021, 16:03:25

[ blast from the past department ]

Fujii Masao <masao.fujii@oss.nttdata.com> writes:
> Finally I pushed the patch!
> Many thanks for all involved in this patch!

It turns out that the regression test outputs from this patch are
unstable under debug_discard_caches (nee CLOBBER_CACHE_ALWAYS).
You can easily check this in HEAD or v14, with something along
the lines of

$ cd ~/pgsql/contrib/pg_stat_statements
$ echo "debug_discard_caches = 1" >/tmp/temp_config
$ TEMP_CONFIG=/tmp/temp_config make check

and what you will get is a diff like this:

 SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
                        query                         | plans | calls | rows
...
- PREPARE prep1 AS SELECT COUNT(*) FROM test          |     2 |     4 |    4
+ PREPARE prep1 AS SELECT COUNT(*) FROM test          |     4 |     4 |    4

The reason we didn't detect this long since is that the buildfarm
client script fails to run "make check" for contrib modules that
are marked NO_INSTALLCHECK, so that pg_stat_statements (among
others) has received precisely zero buildfarm testing.  Buildfarm
member sifaka is running an unreleased version of the script that
fixes that oversight, and when I experimented with turning on
debug_discard_caches, I got this failure, as shown at [1].

The cause of the failure of course is that cache clobbering includes
plan cache clobbering, so that the prepared statement's plan is
remade each time it's used, not only twice as the test expects.
However, remembering that cache flushes can happen for other reasons,
it's my guess that this test case would prove unstable in the buildfarm
even without considering the CLOBBER_CACHE_ALWAYS members.  For example,
a background autovacuum hitting the "test" table at just the right time
would result in extra planning.  We haven't seen that because the
buildfarm's not running this test, but that's about to change.

So AFAICS this test is inherently unstable and there is no code bug
to be fixed.  We could drop the "plans" column from this query, or
print something approximate like "plans > 0 AND plans <= calls".
Thoughts?

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sifaka&dt=2021-07-24%2023%3A53%3A52

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

25 July 2021, 16:36:37

On Sun, Jul 25, 2021 at 12:03:25PM -0400, Tom Lane wrote:

> The cause of the failure of course is that cache clobbering includes
> plan cache clobbering, so that the prepared statement's plan is
> remade each time it's used, not only twice as the test expects.
> However, remembering that cache flushes can happen for other reasons,
> it's my guess that this test case would prove unstable in the buildfarm
> even without considering the CLOBBER_CACHE_ALWAYS members.  For example,
> a background autovacuum hitting the "test" table at just the right time
> would result in extra planning.  We haven't seen that because the
> buildfarm's not running this test, but that's about to change.

Indeed.

> So AFAICS this test is inherently unstable and there is no code bug
> to be fixed.  We could drop the "plans" column from this query, or
> print something approximate like "plans > 0 AND plans <= calls".
> Thoughts?

I think we should go with the latter.  Checking for a legit value, even if it's
a bit imprecise is still better than nothing.

Would it be worth to split the query for the prepared statement row vs the rest
to keep the full "plans" coverage when possible?

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Tom Lane

Date:

25 July 2021, 16:59:35

Julien Rouhaud <rjuju123@gmail.com> writes:
> On Sun, Jul 25, 2021 at 12:03:25PM -0400, Tom Lane wrote:
>> So AFAICS this test is inherently unstable and there is no code bug
>> to be fixed.  We could drop the "plans" column from this query, or
>> print something approximate like "plans > 0 AND plans <= calls".
>> Thoughts?

> I think we should go with the latter.  Checking for a legit value, even if it's
> a bit imprecise is still better than nothing.

> Would it be worth to split the query for the prepared statement row vs the rest
> to keep the full "plans" coverage when possible?

+1, the same thought occurred to me later.  Also, if we're making
it specific to the one PREPARE example, we could get away with
checking "plans >= 2 AND plans <= calls", with a comment like
"we expect at least one replan event, but there could be more".

Do you want to prepare a patch?

            regards, tom lane

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

25 July 2021, 17:08:08

Le lun. 26 juil. 2021 à 00:59, Tom Lane <tgl@sss.pgh.pa.us> a écrit :

Julien Rouhaud <rjuju123@gmail.com> writes:
> On Sun, Jul 25, 2021 at 12:03:25PM -0400, Tom Lane wrote:

> Would it be worth to split the query for the prepared statement row vs the rest
> to keep the full "plans" coverage when possible?

+1, the same thought occurred to me later. Also, if we're making
it specific to the one PREPARE example, we could get away with
checking "plans >= 2 AND plans <= calls", with a comment like
"we expect at least one replan event, but there could be more".

Do you want to prepare a patch?

Sure, I will work on that tomorrow!

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Andrew Dunstan

Date:

25 July 2021, 22:25:42

On 7/25/21 12:03 PM, Tom Lane wrote:
>
> So AFAICS this test is inherently unstable and there is no code bug
> to be fixed.  We could drop the "plans" column from this query, or
> print something approximate like "plans > 0 AND plans <= calls".
> Thoughts?
>

Is that likely to tell us anything very useful? I suppose it's really
just a check against insane values. Since the test is unstable it's hard
to do more than that.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Tom Lane

Date:

25 July 2021, 22:46:53

Andrew Dunstan <andrew@dunslane.net> writes:
> On 7/25/21 12:03 PM, Tom Lane wrote:
>> So AFAICS this test is inherently unstable and there is no code bug
>> to be fixed.  We could drop the "plans" column from this query, or
>> print something approximate like "plans > 0 AND plans <= calls".
>> Thoughts?

> Is that likely to tell us anything very useful?

The variant suggested downthread ("plans >= 2 AND plans <= calls" for the
PREPARE entry only) seems like it's still reasonably useful.  At least it
can verify that a replan has occurred and been counted.

            regards, tom lane

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

26 July 2021, 01:36:21

On Mon, Jul 26, 2021 at 01:08:08AM +0800, Julien Rouhaud wrote:
> Le lun. 26 juil. 2021 à 00:59, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
> 
> > Julien Rouhaud <rjuju123@gmail.com> writes:
> > > On Sun, Jul 25, 2021 at 12:03:25PM -0400, Tom Lane wrote:
> >
> 
> > > Would it be worth to split the query for the prepared statement row vs
> > the rest
> > > to keep the full "plans" coverage when possible?
> >
> > +1, the same thought occurred to me later.  Also, if we're making
> > it specific to the one PREPARE example, we could get away with
> > checking "plans >= 2 AND plans <= calls", with a comment like
> > "we expect at least one replan event, but there could be more".
> 
> 
> > Do you want to prepare a patch?
> >
> 
> Sure, I will work on that tomorrow!

I attach a patch that splits the test and add a comment explaining the
boundaries for the new query.

Checked with and without forced invalidations.

Attachment

v1-0001-Make-pg_stat_statements-tests-immune-to-prepared-.patch

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Tom Lane

Date:

26 July 2021, 03:26:02

Julien Rouhaud <rjuju123@gmail.com> writes:
> I attach a patch that splits the test and add a comment explaining the
> boundaries for the new query.
> Checked with and without forced invalidations.

Pushed with a little cosmetic fooling-about.

            regards, tom lane

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Julien Rouhaud

Date:

26 July 2021, 03:32:20

On Sun, Jul 25, 2021 at 11:26:02PM -0400, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > I attach a patch that splits the test and add a comment explaining the
> > boundaries for the new query.
> > Checked with and without forced invalidations.
> 
> Pushed with a little cosmetic fooling-about.

Thanks!

Re: Planning counters in pg_stat_statements (using pgss_store)

From

Fujii Masao

Date:

26 July 2021, 03:35:01


On 2021/07/26 12:32, Julien Rouhaud wrote:
> On Sun, Jul 25, 2021 at 11:26:02PM -0400, Tom Lane wrote:
>> Julien Rouhaud <rjuju123@gmail.com> writes:
>>> I attach a patch that splits the test and add a comment explaining the
>>> boundaries for the new query.
>>> Checked with and without forced invalidations.
>>
>> Pushed with a little cosmetic fooling-about.
> 
> Thanks!

Thanks a lot!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION