Thread: [PATCH] Tracking statements entry timestamp in pg_stat_statements

[PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
This is a proposal for a new feature in pg_stat_statements extension.

As a statistical extension providing counters pg_stat_statements
extension is a target for various sampling solutions. All of them
interested in calculation of statement statistics increments between
two samples. But we face a problem here - observing one statement with
its statistics right now we can't be sure that statistics increment for
this statement is continuous from previous sample. This statement could
be deallocated after previous sample and come back soon. Also it could
happen that statement executions after that increased statistics to
above the values we observed in previous sample making it impossible to
detect deallocation on statement level.
My proposition here is to store statement entry timestamp. In this case
any sampling solution in case of returning statement will easily detect
it by changed timestamp value. And for every statement we will know
exact time interval for its statistics.

Attachment

RE: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Andrei,

I think the idea is good because the pg_stat_statements_info view cannot distinguish
whether the specific statement is deallocated or not.
But multiple calling of GetCurrentTimestamp() may cause some performance issues.
How about adding a configuration parameter for controlling this feature?
Or we don't not have to worry about that?


> +        if (api_version >= PGSS_V1_9)
> +        {
> +            values[i++] = TimestampTzGetDatum(first_seen);
> +        }

I think {} is not needed here.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Kuroda,

Thank you for your attention!

On Tue, 2021-03-23 at 02:13 +0000, kuroda.hayato@fujitsu.com wrote:
> But multiple calling of GetCurrentTimestamp() may cause some
> performance issues.
> How about adding a configuration parameter for controlling this
> feature?
> Or we don't not have to worry about that?
Certaily I was thinking about this. And I've taken an advice of Teodor
Sigaev - a much more expirienced developer than me. It seems that
GetCurrentTimestamp() is fast enough for our purpose and we won't call
it too often - only on new statement entry allocation.

By the way right now in my workload tracing tool pg_profile I have to
reset pg_stat_statements on every sample (wich is about 30-60 minutes)
to make sure that all workload between samples is captured. This causes
much more overhead. Introduced first_seen column can eliminate the need
of resets.

However, there is another way - we can store the curent value
of pg_stat_statements_info.dealloc field when allocating a new
statement entry instead of timstamping it. Probably, it would be little
faster, but timestamp seems much more valuable here.
> 
> > +        if (api_version >= PGSS_V1_9)
> > +        {
> > +            values[i++] = TimestampTzGetDatum(first_seen);
> > +        }
> 
> I think {} is not needed here.
Of course, thank you!
-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
On Tue, Mar 23, 2021 at 09:50:16AM +0300, Andrei Zubkov wrote:
> 
> By the way right now in my workload tracing tool pg_profile I have to
> reset pg_stat_statements on every sample (wich is about 30-60 minutes)
> to make sure that all workload between samples is captured. This causes
> much more overhead. Introduced first_seen column can eliminate the need
> of resets.

Note that you could also detect entries for which some counters decreased (e.g.
the execution count), and in that case only use the current values.  It should
give the exact same results as what you will get with the first_seen column,
except of course if some entry is almost never used and is suddenly used a lot
after an explicit reset or an eviction and only until you perform your
snapshot.  I'm not sure that it's a very likely scenario though.

FTR that's how powa currently deals with reset/eviction.



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Julien,

On Tue, 2021-03-23 at 15:03 +0800, Julien Rouhaud wrote:
> Note that you could also detect entries for which some counters
> decreased (e.g.
> the execution count), and in that case only use the current values. 

Yes, but checking condition for several counters seems complicated
compared to check only one field.

>  It should
> give the exact same results as what you will get with the first_seen
> column,
> except of course if some entry is almost never used and is suddenly
> used a lot
> after an explicit reset or an eviction and only until you perform
> your
> snapshot.  I'm not sure that it's a very likely scenario though.
But it is possible, and we are guessing here. Storing a timestamp does
not seems too expensive to me, but it totally eliminates guessing, and
provides a clear view about the time interval we watching for this
specific statement.

> FTR that's how powa currently deals with reset/eviction.
PoWA sampling is much more frequent than pg_profile. For PoWA it is, of
cource, very unlikely scenario, but still possible.

-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





RE: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Andrei,

> Certaily I was thinking about this. And I've taken an advice of Teodor
> Sigaev - a much more expirienced developer than me. It seems that
> GetCurrentTimestamp() is fast enough for our purpose and we won't call
> it too often - only on new statement entry allocation.

OK.

> However, there is another way - we can store the curent value
> of pg_stat_statements_info.dealloc field when allocating a new
> statement entry instead of timstamping it. Probably, it would be little
> faster, but timestamp seems much more valuable here.

I don't like the idea because such a column has no meaning for the specific row.
I prefer storing timestamp if GetCurrentTimestamp() is cheap.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Dear Kuroda,

> I don't like the idea because such a column has no meaning for the
> specific row.
> I prefer storing timestamp if GetCurrentTimestamp() is cheap.
I agree. New version attached.
-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Seino Yuki
Date:
2021-03-23 23:08 に Andrei Zubkov さんは書きました:
> Dear Kuroda,
> 
>> I don't like the idea because such a column has no meaning for the
>> specific row.
>> I prefer storing timestamp if GetCurrentTimestamp() is cheap.
> I agree. New version attached.

Thanks for posting the patch.
I agree with this content.

Is it necessary to update the version of pg_stat_statements now that the 
release is targeted for PG15?
We take into account the risk that users will misunderstand.

Regards,

-- 
Yuki Seino
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
On Wed, 2021-04-07 at 17:26 +0900, Seino Yuki wrote:


> Is it necessary to update the version of pg_stat_statements now that
> the 
> release is targeted for PG15?
I think, yes, version of pg_stat_statements is need to be updated. Is
it will be 1.10 in PG15?

Regards,
-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





RE: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Andrei,

> I think, yes, version of pg_stat_statements is need to be updated. Is
> it will be 1.10 in PG15?

I think you are right. 
According to [1] we can bump up the version per one PG major version,
and any features are not committed yet for 15.

[1]: https://www.postgresql.org/message-id/20201202040516.GA43757@nol


Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi, Kuroda!

I've intended to change the pg_stat_statements version with rebasing
this patch to the current master branch state. Now this is commit
07e5e66.

But I'm unable to test the patch - it seems that pg_stat_statements is
receiving queryId = 0 for every statements in every hook now and
statements are not tracked at all.

Am I mistaken somewhere? Maybe you know why this is happening?

Thank you!
-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


On Fri, 2021-04-09 at 00:23 +0000, kuroda.hayato@fujitsu.com wrote:
> Dear Andrei,
> 
> > I think, yes, version of pg_stat_statements is need to be updated.
> > Is
> > it will be 1.10 in PG15?
> 
> I think you are right. 
> According to [1] we can bump up the version per one PG major version,
> and any features are not committed yet for 15.
> 
> [1]: https://www.postgresql.org/message-id/20201202040516.GA43757@nol
> 
> 
> Best Regards,
> Hayato Kuroda
> FUJITSU LIMITED
> 




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
Le mer. 14 avr. 2021 à 17:22, Andrei Zubkov <zubkov@moonset.ru> a écrit :
 
But I'm unable to test the patch - it seems that pg_stat_statements is
receiving queryId = 0 for every statements in every hook now and
statements are not tracked at all.

Am I mistaken somewhere? Maybe you know why this is happening?

did you enable compute_query_id new parameter? 

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
On Wed, 2021-04-14 at 17:32 +0800, Julien Rouhaud wrote:

did you enable compute_query_id new parameter? 

Hi, Julien!
Thank you very much! I've missed it.

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hello, Kuroda!

On Fri, 2021-04-09 at 00:23 +0000, kuroda.hayato@fujitsu.com wrote:
> I think you are right. 
> According to [1] we can bump up the version per one PG major version,
> and any features are not committed yet for 15.
> 
> [1]: https://www.postgresql.org/message-id/20201202040516.GA43757@nol

Version 2 of patch attached. 
pg_stat_statements version is now 1.10 and patch is based on 0f61727.

-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Chengxi Sun
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

Hi, Andrei

I tested your patch, and it works well. I also prefer timestamp inseatead of dealloc num.
I think it can provide more useful details about query statements.

Regards,
Martin Sun

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi, Martin

On Mon, 2021-04-19 at 11:39 +0000, Chengxi Sun wrote:
> I tested your patch, and it works well. I also prefer timestamp
> inseatead of dealloc num.
> I think it can provide more useful details about query statements.
> 
Thank you for your review.
Certainly, timestamp is valuable here. Deallocation number is only a
workaround in unlikely case when timestamping will cost a much. It
seems, that it can happen only when significant amount of statements
causes a new entry in pg_stat_statements hashtable. However, in such
case using of pg_stat_statements extension might be qute difficult.
-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re[2]: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Мельников Антон Андреевич
Date:
Hi, Andrey!
 
I’ve tried to apply your patch v2-0001 on current master, but i failed.
There were git apply errors at:
pg_stat_statements.out:941
pg_stat_statements.sql:385
 
Best Regards,

Anton Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi, Anton!

I've corrected the patch and attached a new version.

-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

On Wed, 2021-10-06 at 18:13 +0300, Мельников Антон Андреевич wrote:
> Hi, Andrey!
>  
> I’ve tried to apply your patch v2-0001 on current master, but i
> failed.
> There were git apply errors at:
> pg_stat_statements.out:941
> pg_stat_statements.sql:385
>  
> Best Regards,
> 
> Anton Melnikov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>  


Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
There is an issue with this patch. It's main purpose is the ability to
calculate values of pg_stat_statements view for a time period between
two samples without resetting pg_stat_statements being absolutely sure
that the statement was not evicted.
Such approach solves the problem for metrics with except of min and max
time values. It seems that partial reset is needed here resetting
min/max values during a sample. But overall min/max values will be lost
in this case. Does addition of resettable min/max metrics to the
pg_stat_statemets view seems reasonable here?

-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
"Anton A. Melnikov"
Date:
On 07.10.2021 15:31, Andrei Zubkov wrote:
> There is an issue with this patch. It's main purpose is the ability to
> calculate values of pg_stat_statements view
>  [...]
> Does addition of resettable min/max metrics to the
> pg_stat_statemets view seems reasonable here?


Hello, Andrey!

I think it depends on what the slow top level sampler wants.
Let define the current values in pg_stat_statements for some query as gmin and gmax.
It seems to be a two main variants:
1)  If top level sampler wants to know for some query the min and max values for 
the entire watch time, then existing gmin and gmax in pg_stat_statements are sufficient.
The top level sampler can clarify top_min and top_max at every slow sample as follows:
top_max = gmax > top_max ? gmax : top_max;
top_min = gmin < top_min ? gmin : top_min;
This should work regardless of whether there was a reset between samples or not.
2) The second case happens when the top level sampler wants to know the min and max 
values for sampling period.
In that case i think one shouldn't not use gmin and gmax and especially reset
them asynchronously from outside because its lifetime and sampling period are 
independent values and moreover someone else might need gmin and gmax as is.
So i suppose that additional vars loc_min and loc_max is a right way to do it.
If that, at every top sample one need to replace not clarify
the new top values as follows:
top_max = loc_max; loc_max = 0;
top_min = loc_min; loc_min = INT_MAX;
And one more thing, if there was a reset of stats between two samples,
then i think it is the best to ignore the new values, 
since they were obtained for an incomplete period.
This patch, thanks to the saved time stamp, makes possible 
to determine the presence of reset between samples and
there should not be a problem to realize such algorithm.

The patch is now applied normally, all my tests were successful.
The only thing I could suggest to your notice
is a small cosmetic edit to replace
the numeric value in #define on line 1429 of pg_stat_statements.c
by one of the constants defined above. 

Best regards,
Anton Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


 

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hello,

On Sun, 2022-01-02 at 13:28 -0800, Andres Freund wrote:
> Hi,
> 
> This fails with an assertion failure:
> https://cirrus-ci.com/task/5567540742062080?logs=cores#L55
> 
> 
Andres, thank you for your test! I've missed it. Fixed in attached
patch v5.

On Wed, 2021-12-22 at 04:25 +0300, Anton A. Melnikov wrote:
> 
> 
> I completely agree that creating a separate view for these new fields
> is
> the most correct thing to do.

Anton,

I've created a new view named pg_stat_statements_aux. But for now both
views are using the same function pg_stat_statements which returns all
fields. It seems reasonable to me - if sampling solution will need all
values it can query the function.

> Also it might be better to use the term 'auxiliary' and use the same 
> approach as for existent similar vars.

Agreed, renamed to auxiliary term.

> So internally it might look something like this:
> 
> double  aux_min_time[PGSS_NUMKIND];
> double  aux_max_time[PGSS_NUMKIND];
> TimestampTz     aux_stats_reset;
> 
> And at the view level:
>    aux_min_plan_time float8,
>    aux_max_plan_time float8,
>    aux_min_exec_time float8,
>    aux_max_exec_time float8,
>    aux_stats_reset timestamp with time zone
> 
> Functions names might be pg_stat_statements_aux() and 
> pg_stat_statements_aux_reset().
> 

But it seems "stats_reset" term is not quite correct in this case. The
"stats_since" field holds the timestamp of hashtable entry, but not the
reset time. The same applies to aux_stats_since - for new statement it
holds its entry time, but in case of reset it will hold the reset time.

"stats_reset" name seems a little bit confusing to me.

Attached patch v5
-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
"Anton A. Melnikov"
Date:
Hi, Andrey!

I've checked the 5th version of the patch and there are some remarks.

 >I've created a new view named pg_stat_statements_aux. But for now both
 >views are using the same function pg_stat_statements which returns all
 >fields. It seems reasonable to me - if sampling solution will need all
 >values it can query the function.

Agreed, it might be useful in some cases.

 >But it seems "stats_reset" term is not quite correct in this case. The
 >"stats_since" field holds the timestamp of hashtable entry, but not the
 >reset time. The same applies to aux_stats_since - for new statement it
 >holds its entry time, but in case of reset it will hold the reset time.

Thanks for the clarification. Indeed if we mean the word 'reset' as the 
removal of all the hashtable entries during pg_stat_statements_reset() 
then we shouldn't use it for the first occurrence timestamp in the 
struct pgssEntry.
So with the stats_since field everything is clear.
On the other hand aux_stats_since field can be updated for two reasons:
1) The same as for stats_since due to first occurrence of entry in the 
hashtable. And it will be equal to stats_since timestamp in that case.
2) Due to an external reset from an upper level sampler.
I think it's not very important how to name this field, but it would be 
better to mention both these reasons in the comment.

As for more important things, if the aux_min_time initial value is zero 
like now, then if condition on line 1385 of pg_stat_statements.c will 
never be true and aux_min_time will remain zero. Init aux_min_time with 
INT_MAX can solve this problem.

It is possible to reduce size of entry_reset_aux() function  via 
removing if condition on line 2606 and entire third branch from line 
2626. Thanks to check in 2612 this will work in all cases.

Also it would be nice to move the repeating several times
lines 2582-2588 into separate function. I think this can make 
entry_reset_aux() more shorter and clearer.

In general, the 5th patch applies with no problems, make check-world and 
CI gives no error and patch seems to be closely to become RFC.

With best regards,
-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Julien,
 Tue, 2022-01-25 at 18:08 +0800, Julien Rouhaud wrote:
> 
> Are those 4 new counters really worth it?
> 
> The min/max were added to make pg_stat_statements a bit more useful
> if you
> have nothing else using that extension.  But once you setup a tool
> that
> snapshots the metrics regularly, do you really need to know e.g.
> "what was the
> maximum execution time in the last 3 years", which will typically be
> what
> people will retrieve from the non-aux min/max counters, rather than
> simply
> using your additional tool for better and more precise information?

Of course we can replace old min/max metrics with the new "aux" min/max
metrics. It seems reasonable to me because they will have the same
behavior until we touch reset_aux. I think we can assume that min/max
data is saved somewhere if reset_aux was performed, but how about the
backward compatibility?
There may be some monitoring solutions that doesn't expect min/max
stats reset independently of other statement statistics.
It seems highly unlikely to me, because the min/max stats for "the last
3 years" is obvious unusable but maybe someone uses them as a sign of
something?
Are we need to worry about that?

Also, there is one more dramatic consequence of such decision...
What min/max values should be returned after the auxiliary reset but
before the next statement execution?
The NULL values seems reasonable but there was not any NULLs before and
backward compatibility seems broken. Another approach is to return the
old values of min/max stats and the old aux_stats_since value until the
next statement execution but it seems strange when the reset was
performed and it doesn't affected any stats instantly.

regards,
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
Hi Andrei,

On Tue, Jan 25, 2022 at 02:58:17PM +0300, Andrei Zubkov wrote:
> 
> Of course we can replace old min/max metrics with the new "aux" min/max
> metrics. It seems reasonable to me because they will have the same
> behavior until we touch reset_aux. I think we can assume that min/max
> data is saved somewhere if reset_aux was performed, but how about the
> backward compatibility?
> There may be some monitoring solutions that doesn't expect min/max
> stats reset independently of other statement statistics.
> It seems highly unlikely to me, because the min/max stats for "the last
> 3 years" is obvious unusable but maybe someone uses them as a sign of
> something?

To be honest I don't see how any monitoring solution could make any use of
those fields as-is.  For that reason in PoWA we unfortunately have to entirely
ignore them.  There was a previous attempt to provide a way to reset those
counters only (see [1]), but it was returned with feedback due to lack of TLC
from the author.

> Are we need to worry about that?

I don't think it's a problem, as once you have a solution on top of
pg_stat_statements, you get information order of magnitude better from that
solution instead of pg_stat_statements.  And if that's a problem, well either
don't reset those counters, or don't use the external solution if it does it
automatically and you're not ok with it.

> Also, there is one more dramatic consequence of such decision...
> What min/max values should be returned after the auxiliary reset but
> before the next statement execution?
> The NULL values seems reasonable but there was not any NULLs before and
> backward compatibility seems broken. Another approach is to return the
> old values of min/max stats and the old aux_stats_since value until the
> next statement execution but it seems strange when the reset was
> performed and it doesn't affected any stats instantly.

If you're worried about some external table having a NOT NULL constraint for
those fields, how about returning NaN instead?  That's a valid value for a
double precision data type.

[1] https://www.postgresql.org/message-id/1762890.8ARNpCrDLI@peanuts2



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Julien,

On Tue, 2022-01-25 at 20:22 +0800, Julien Rouhaud wrote:
> To be honest I don't see how any monitoring solution could make any
> use of
> those fields as-is.  For that reason in PoWA we unfortunately have to
> entirely
> ignore them.  There was a previous attempt to provide a way to reset
> those
> counters only (see [1]), but it was returned with feedback due to
> lack of TLC
> from the author.

Thank you, I've just seen a thread in [1], it was of course a weak
attempt and I think it could be done better.
> 
> 
> I don't think it's a problem, as once you have a solution on top of
> pg_stat_statements, you get information order of magnitude better
> from that solution instead of pg_stat_statements.

Agreed. I'm worry about having different solutions running
simultaneously. All of them may want to get accurate min/max values,
but if they all will reset min/max statistics, they will interfere each
other. It seems that min/max resetting should be configurable in each
sampler as a partial problem solution. At least, every sampling
solution can make a decision based on reset timestamps provided by
pg_stat_statements now.
> 
> 
> If you're worried about some external table having a NOT NULL
> constraint for
> those fields, how about returning NaN instead?  That's a valid value
> for a
> double precision data type.

I don't know for sure what we can expect to be wrong here. My opinion
is to use NULLs, as they seems more suitable here. But this can be done
only if we are not expecting significant side effects.
-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
"Anton A. Melnikov"
Date:
Hello!

On 26.01.2022 16:43, Andrei Zubkov wrote:

 >>
 >> If you're worried about some external table having a NOT NULL
 >> constraint for
 >> those fields, how about returning NaN instead?  That's a valid value
 >> for a
 >> double precision data type.
 >
 > I don't know for sure what we can expect to be wrong here. My opinion
 > is to use NULLs, as they seems more suitable here. But this can be done
 > only if we are not expecting significant side effects.

Let me suggest for your consideration an additional reset request flag 
that can be used to synchronize reset in a way similar to interrupt 
handling.
External reset can set this flag immediately. Then pg_stat_statements 
will wait for the moment when the required query hits into the 
statistics and only at this moment really reset the aux statistics,
write a new timestamp and clear the flag. At the time of real reset, 
total_time will be determined, and pg_stat_statements can immediately 
initialize min and max correctly.
 From reset to the next query execution the aux view will give old 
correct values so neither NaNs nor NULLs will be required.
Also we can put the value of reset request flag into the aux view to 
give feedback to the external application that reset was requested.

With best regards,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Greg Stark
Date:
This patch seems to have gotten some feedback but development has
stalled. It's marked "waiting on author" but I'm not clear exactly
what is expected from the authors here. It seems there isn't really
consensus on the design at the moment. There's been no emails in over
a month.

Fwiw I find the idea of having a separate "aux" table kind of awkward.
It'll seem strange to users not familiar with the history and without
any clear idea why the fields are split.



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi

On Fri, 2022-03-25 at 00:37 -0400, Greg Stark wrote:
> Fwiw I find the idea of having a separate "aux" table kind of
> awkward.
> It'll seem strange to users not familiar with the history and without
> any clear idea why the fields are split.

Greg, thank you for your attention and for your thought.

I've just completed the 6th version of a patch implementing idea
proposed by Julien Rouhaud, i.e. without auxiliary statistics. 6th
version will reset current min/max fields to zeros until the first plan
or execute. I've decided to use zeros here because planning statistics
is zero in case of disabled tracking. I think sampling solution could
easily handle this.

-- 
Regards, Andrei Zubkov


Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
Hi,

On Fri, Mar 25, 2022 at 01:25:23PM +0300, Andrei Zubkov wrote:
> Greg, thank you for your attention and for your thought.
> 
> I've just completed the 6th version of a patch implementing idea
> proposed by Julien Rouhaud, i.e. without auxiliary statistics. 6th
> version will reset current min/max fields to zeros until the first plan
> or execute.

Thanks!

I've decided to use zeros here because planning statistics
> is zero in case of disabled tracking. I think sampling solution could
> easily handle this.

I'm fine with it.  It's also consistent with the planning counters when
track_planning is disabled.  And even if the sampling solution doesn't handle
it, you will simply get consistent values, like "0 calls with minmax timing of
0 msec", so it's not really a problem.

Feature wise, I'm happy with the patch.  I just have a few comments.

Tests:

- it's missing some test in sql/oldextversions.sql to validate that the code
  works with the extension in version 1.9
- the last test removed the minmax_plan_zero field, why?

Code:

+    TimestampTz    stats_since;        /* timestamp of entry allocation moment */

I think "timestamp of entry allocation" is enough?

+             * Calculate min and max time. min = 0 and max = 0
+             * means that min/max statistics reset was happen

maybe "means that the min/max statistics were reset"

+/*
+ * Reset min/max values of specified entries
+ */
+static void
+entry_minmax_reset(Oid userid, Oid dbid, uint64 queryid)
+{
[...]

There's a lot of duplicated logic with entry_reset().
Would it be possible to merge at least the C reset function to handle either
all-metrics or minmax-only?  Also, maybe it would be better to have a single SQL
reset function, something like:

pg_stat_statements_reset(IN userid Oid DEFAULT 0,
    IN dbid Oid DEFAULT 0,
    IN queryid bigint DEFAULT 0,
    IN minmax_only DEFAULT false
)

Doc:

+       <structfield>stats_since</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Timestamp of statistics gathering start for the statement

The description is a bit weird.  Maybe like "Time at which statistics gathering
started for this statement"?  Same for the minmax version.



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Julien!

Thank you for such detailed review!

On Wed, 2022-03-30 at 17:31 +0800, Julien Rouhaud wrote:
> Feature wise, I'm happy with the patch.  I just have a few comments.
> 
> Tests:
> 
> - it's missing some test in sql/oldextversions.sql to validate that the
> code
>   works with the extension in version 1.9

Yes, I've just added some tests there, but it seems they are not quite
suficient. Maybe we should try to do some queries to views and
functions in old versions? A least when new C function version
appears...

During tests developing I've noted that current test of
pg_stat_statements_info view actually tests only view access. However
we can test at least functionality of stats_reset field like this:

SELECT now() AS ref_ts \gset
SELECT dealloc, stats_reset >= :'ref_ts' FROM pg_stat_statements_info;
SELECT pg_stat_statements_reset();
SELECT dealloc, stats_reset >= :'ref_ts' FROM pg_stat_statements_info;

Does it seems reasonable? 

> - the last test removed the minmax_plan_zero field, why?

My thaught was as follows... Reexecution of the same query will
definitely cause execution. However, most likely it wouldn't be
planned, but if it would (maybe this is possible, or maybe it will be
possible in the future in some cases) the test shouldn't fail. Checking
of only execution stats seems enough to me - in most cases we can't
check planning stats with such test anyway.
What do you think about it?

> 
> Code:
> 
> +       TimestampTz     stats_since;            /* timestamp of entry
> allocation moment */
> 
> I think "timestamp of entry allocation" is enough?

Yes

> 
> +                        * Calculate min and max time. min = 0 and max
> = 0
> +                        * means that min/max statistics reset was
> happen
> 
> maybe "means that the min/max statistics were reset"

Agreed

> 
> +/*
> + * Reset min/max values of specified entries
> + */
> +static void
> +entry_minmax_reset(Oid userid, Oid dbid, uint64 queryid)
> +{
> [...]
> 
> There's a lot of duplicated logic with entry_reset().
> Would it be possible to merge at least the C reset function to handle
> either
> all-metrics or minmax-only? 

Great point! I've merged minmax reset functionality in the entry_reset
function.

> Also, maybe it would be better to have a single SQL
> reset function, something like:
> 
> pg_stat_statements_reset(IN userid Oid DEFAULT 0,
>         IN dbid Oid DEFAULT 0,
>         IN queryid bigint DEFAULT 0,
>     IN minmax_only DEFAULT false
> )

Of course!

> 
> Doc:
> 
> +       <structfield>stats_since</structfield> <type>timestamp with
> time zone</type>
> +      </para>
> +      <para>
> +       Timestamp of statistics gathering start for the statement
> 
> The description is a bit weird.  Maybe like "Time at which statistics
> gathering
> started for this statement"?  Same for the minmax version.

Agreed.

I've attached 7th patch version with fixes mentioned above.
-- 
Best regards, Andrei Zubkov


Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Greg Stark
Date:
FYI this has a compiler warning showing up on the cfbot:


[13:19:51.544] pg_stat_statements.c: In function ‘entry_reset’:
[13:19:51.544] pg_stat_statements.c:2598:32: error:
‘minmax_stats_reset’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
[13:19:51.544] 2598 | entry->minmax_stats_since = minmax_stats_reset;
[13:19:51.544] | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~

If the patch is otherwise ready to commit then this is an issue that
should be fixed before marking it ready to commit.

Given that this is the last week before feature freeze it'll probably
get moved to a future commitfest unless it's ready to commit.



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
Hi,

On Fri, Apr 01, 2022 at 11:38:52AM -0400, Greg Stark wrote:
> FYI this has a compiler warning showing up on the cfbot:
>
> [13:19:51.544] pg_stat_statements.c: In function ‘entry_reset’:
> [13:19:51.544] pg_stat_statements.c:2598:32: error:
> ‘minmax_stats_reset’ may be used uninitialized in this function
> [-Werror=maybe-uninitialized]
> [13:19:51.544] 2598 | entry->minmax_stats_since = minmax_stats_reset;
> [13:19:51.544] | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
>
> If the patch is otherwise ready to commit then this is an issue that
> should be fixed before marking it ready to commit.
>
> Given that this is the last week before feature freeze it'll probably
> get moved to a future commitfest unless it's ready to commit.

As I mentioned in my last review I think feature wise the patch is ok, it just
needed a few minor changes.  It's a small patch but can help *a lot* tools on
top of pg_stat_statements and give users a better overview of their workload so
it would be nice to commit it in v15.

I was busy looking at the prefetch patch today (not done yet), but I plan to
review the last version over the weekend.  After a quick look at the patch it
seems like a compiler bug.  I'm not sure which clang version is used, but can't
reproduce it locally using clang 13.  I already saw similar false positive,
when a variable is initialized in a branch (here minmax_only == true), and only
then used in similar branches.  I guess that pg_stat_statement_reset() is so
expensive that an extra gettimeofday() wouldn't change much.  Otherwise
initializing to NULL should be enough.



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi,

Thank you, Greg

On Fri, 2022-04-01 at 11:38 -0400, Greg Stark wrote:
> [13:19:51.544] pg_stat_statements.c: In function ‘entry_reset’:
> [13:19:51.544] pg_stat_statements.c:2598:32: error:
> ‘minmax_stats_reset’ may be used uninitialized in this function
> [-Werror=maybe-uninitialized]
> [13:19:51.544] 2598 | entry->minmax_stats_since = minmax_stats_reset;
> [13:19:51.544] | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
>

I was afraid of such warning can appear..

On Sat, 2022-04-02 at 00:13 +0800, Julien Rouhaud wrote:
> I guess that pg_stat_statement_reset() is so
> expensive that an extra gettimeofday() wouldn't change much. 

Agreed

> Otherwise
> initializing to NULL should be enough.

Julien, I would prefer an extra GetCurrentTimestamp(). So, I've opted
to use the common unconditional

stats_reset = GetCurrentTimestamp();

for an entire entry_reset function due to the following:

1. It will be uniform for stats_reset and minmax_stats_reset
2. As you mentioned, it wouldn't change a much
3. The most common way to use this function is to reset all statements
meaning that GetCurrentTimestamp() will be called anyway to update the
value of stats_reset field in pg_stat_statements_info view
4. Actually I would like that pg_stat_statements_reset function was
able to return the value of stats_reset as its result. This could give
to the sampling solutions the ability to check if the last reset (of
any type) was performed by this solution or any other reset was
performed by someone else. It seems valuable to me, but it changes the
result type of the pg_stat_statements_reset() function, so I don't know
if we can do that right now.

v8 attached
--
regards, Andrei

Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andres Freund
Date:
Hi,

On 2022-04-01 22:47:02 +0300, Andrei Zubkov wrote:
> +        entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_FIND, NULL);
> +
> +        if (entry) {
> +            /* Found */
> +            if (minmax_only) {
> +                /* When requested reset only min/max statistics of an entry */
> +                entry_counters = &entry->counters;
> +                for (int kind = 0; kind < PGSS_NUMKIND; kind++)
> +                {
> +                    entry_counters->max_time[kind] = 0;
> +                    entry_counters->min_time[kind] = 0;
> +                }
> +                entry->minmax_stats_since = stats_reset;
> +            }
> +            else
> +            {
> +                /* Remove the key otherwise  */
> +                hash_search(pgss_hash, &entry->key, HASH_REMOVE, NULL);
> +                num_remove++;
> +            }
> +        }

It seems decidedly not great to have four copies of this code. It was already
not great before, but this patch makes the duplicated section go from four
lines to 20 or so.

Greetings,

Andres Freund



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
On Thu, Mar 31, 2022 at 01:06:10PM +0300, Andrei Zubkov wrote:
>
> On Wed, 2022-03-30 at 17:31 +0800, Julien Rouhaud wrote:
> > Feature wise, I'm happy with the patch.  I just have a few comments.
> >
> > Tests:
> >
> > - it's missing some test in sql/oldextversions.sql to validate that the
> > code
> >   works with the extension in version 1.9
>
> Yes, I've just added some tests there, but it seems they are not quite
> suficient. Maybe we should try to do some queries to views and
> functions in old versions? A least when new C function version
> appears...

I'm not sure if that's really helpful.  If you have new C functions and old
SQL-version, you won't be able to reach them anyway.  Similarly, if you have
the new SQL but the old .so (which we can't test), it will just fail as the
symbol doesn't exist.  The real problem that has to be explicitly handled by
the C code is different signatures for C functions.
>
> During tests developing I've noted that current test of
> pg_stat_statements_info view actually tests only view access. However
> we can test at least functionality of stats_reset field like this:
>
> SELECT now() AS ref_ts \gset
> SELECT dealloc, stats_reset >= :'ref_ts' FROM pg_stat_statements_info;
> SELECT pg_stat_statements_reset();
> SELECT dealloc, stats_reset >= :'ref_ts' FROM pg_stat_statements_info;
>
> Does it seems reasonable?

It looks reasonable, especially if the patch adds a new mode for the reset
function.

> > - the last test removed the minmax_plan_zero field, why?
>
> My thaught was as follows... Reexecution of the same query will
> definitely cause execution. However, most likely it wouldn't be
> planned, but if it would (maybe this is possible, or maybe it will be
> possible in the future in some cases) the test shouldn't fail. Checking
> of only execution stats seems enough to me - in most cases we can't
> check planning stats with such test anyway.
> What do you think about it?

Ah I see.  I guess we could set plan_cache_mode to force_generic_plan to make
sure we go though planning.  But otherwise just adding a comment saying that
the test has to be compatible with different plan caching approach would be
fine with me.

Thanks for the work on merging the functions!  I will reply on the other parts
of the thread where some discussion started.



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
On Fri, Apr 01, 2022 at 10:47:02PM +0300, Andrei Zubkov wrote:
>
> On Fri, 2022-04-01 at 11:38 -0400, Greg Stark wrote:
> > [13:19:51.544] pg_stat_statements.c: In function ‘entry_reset’:
> > [13:19:51.544] pg_stat_statements.c:2598:32: error:
> > ‘minmax_stats_reset’ may be used uninitialized in this function
> > [-Werror=maybe-uninitialized]
> > [13:19:51.544] 2598 | entry->minmax_stats_since = minmax_stats_reset;
> > [13:19:51.544] | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
> >
>
> I was afraid of such warning can appear..
>
> On Sat, 2022-04-02 at 00:13 +0800, Julien Rouhaud wrote:
> > I guess that pg_stat_statement_reset() is so
> > expensive that an extra gettimeofday() wouldn't change much. 
>
> Agreed
>
> > Otherwise
> > initializing to NULL should be enough.
>
> Julien, I would prefer an extra GetCurrentTimestamp(). So, I've opted
> to use the common unconditional
>
> stats_reset = GetCurrentTimestamp();
>
> for an entire entry_reset function due to the following:
>
> 1. It will be uniform for stats_reset and minmax_stats_reset
> 2. As you mentioned, it wouldn't change a much
> 3. The most common way to use this function is to reset all statements
> meaning that GetCurrentTimestamp() will be called anyway to update the
> value of stats_reset field in pg_stat_statements_info view
> 4. Actually I would like that pg_stat_statements_reset function was
> able to return the value of stats_reset as its result. This could give
> to the sampling solutions the ability to check if the last reset (of
> any type) was performed by this solution or any other reset was
> performed by someone else. It seems valuable to me, but it changes the
> result type of the pg_stat_statements_reset() function, so I don't know
> if we can do that right now.

I'm fine with always getting the current timestamp when calling the function.

I'm not sure about returning the ts.  If you need it you could call SELECT
now() FROM pg_stat_statements_reset() (or clock_timestamp()).  It won't be
entirely accurate but since the function will have an exclusive lock during the
whole execution that shouldn't be a problem.  Now you're already adding a new
version of the C function so I guess that it wouldn't require any additional
effort so why not.



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
On Fri, Apr 01, 2022 at 01:01:53PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2022-04-01 22:47:02 +0300, Andrei Zubkov wrote:
> > +        entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_FIND, NULL);
> > +
> > +        if (entry) {
> > +            /* Found */
> > +            if (minmax_only) {
> > +                /* When requested reset only min/max statistics of an entry */
> > +                entry_counters = &entry->counters;
> > +                for (int kind = 0; kind < PGSS_NUMKIND; kind++)
> > +                {
> > +                    entry_counters->max_time[kind] = 0;
> > +                    entry_counters->min_time[kind] = 0;
> > +                }
> > +                entry->minmax_stats_since = stats_reset;
> > +            }
> > +            else
> > +            {
> > +                /* Remove the key otherwise  */
> > +                hash_search(pgss_hash, &entry->key, HASH_REMOVE, NULL);
> > +                num_remove++;
> > +            }
> > +        }
> 
> It seems decidedly not great to have four copies of this code. It was already
> not great before, but this patch makes the duplicated section go from four
> lines to 20 or so.

+1



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi,

On Fri, 2022-04-01 at 13:01 -0700, Andres Freund wrote:
> It seems decidedly not great to have four copies of this code. It was
> already
> not great before, but this patch makes the duplicated section go from
> four
> lines to 20 or so.

Agreed. I've created the single_entry_reset() function wrapping this
code. I wonder if it should be declared as inline to speedup a little.

On Sat, 2022-04-02 at 15:10 +0800, Julien Rouhaud wrote:
> > However
> > we can test at least functionality of stats_reset field like this:
> > 
> > SELECT now() AS ref_ts \gset
> > SELECT dealloc, stats_reset >= :'ref_ts' FROM
> > pg_stat_statements_info;
> > SELECT pg_stat_statements_reset();
> > SELECT dealloc, stats_reset >= :'ref_ts' FROM
> > pg_stat_statements_info;
> > 
> > Does it seems reasonable?
> 
> It looks reasonable, especially if the patch adds a new mode for the
> reset
> function.

I've implemented this test.

> > Checking
> > of only execution stats seems enough to me - in most cases we can't
> > check planning stats with such test anyway.
> > What do you think about it?
> 
> Ah I see.  I guess we could set plan_cache_mode to force_generic_plan
> to make
> sure we go though planning.  But otherwise just adding a comment
> saying that
> the test has to be compatible with different plan caching approach
> would be
> fine with me.

Set plan_cache_mode seems a little bit excess to me. And maybe in the
future some another plan caching strategies will be implementd with
coresponding settings.. So I've just left a comment there.

On Sat, 2022-04-02 at 15:21 +0800, Julien Rouhaud wrote:
> I'm not sure about returning the ts.  If you need it you could call
> SELECT
> now() FROM pg_stat_statements_reset() (or clock_timestamp()).  It
> won't be
> entirely accurate but since the function will have an exclusive lock
> during the
> whole execution that shouldn't be a problem.  Now you're already
> adding a new
> version of the C function so I guess that it wouldn't require any
> additional
> effort so why not.

I think that if we can do it in accurate way and there is no obvious
side effects, why not to try it...
Changing of pg_stat_statements_reset function result caused a
confiderable tests update. Also, I'm not sure that my description of
this feature in the docs is blameless..

After all, I'm a little bit in doubt about this feature, so I'm ready
to rollback it.

v9 attached
--
regards, Andrei

Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
On Sat, Apr 02, 2022 at 01:12:54PM +0300, Andrei Zubkov wrote:
> On Fri, 2022-04-01 at 13:01 -0700, Andres Freund wrote:
> > It seems decidedly not great to have four copies of this code. It was
> > already
> > not great before, but this patch makes the duplicated section go from
> > four
> > lines to 20 or so.
> 
> Agreed. I've created the single_entry_reset() function wrapping this
> code. I wonder if it should be declared as inline to speedup a little.

Maybe a macro would be better here?  I don't know if that's generally ok or
just old and not-that-great code, but there are other places relying on macros
when a plain function call isn't that convenient (like here returning 0 or 1 as
a hack for incrementing num_remove), for instance in hba.c.

> On Sat, 2022-04-02 at 15:21 +0800, Julien Rouhaud wrote:
> > I'm not sure about returning the ts.  If you need it you could call
> > SELECT
> > now() FROM pg_stat_statements_reset() (or clock_timestamp()).  It
> > won't be
> > entirely accurate but since the function will have an exclusive lock
> > during the
> > whole execution that shouldn't be a problem.  Now you're already
> > adding a new
> > version of the C function so I guess that it wouldn't require any
> > additional
> > effort so why not.
> 
> I think that if we can do it in accurate way and there is no obvious
> side effects, why not to try it...
> Changing of pg_stat_statements_reset function result caused a
> confiderable tests update. Also, I'm not sure that my description of
> this feature in the docs is blameless..
> 
> After all, I'm a little bit in doubt about this feature, so I'm ready
> to rollback it.

Well, I personally don't think that I would need it for powa as it's designed
to have very frequent snapshot.  I know you have a different approach in
pg_profile, but I'm not sure it will be that useful for you either?



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
On Sat, 2022-04-02 at 18:56 +0800, Julien Rouhaud wrote:
> Maybe a macro would be better here?  I don't know if that's generally
> ok or
> just old and not-that-great code, but there are other places relying
> on macros
> when a plain function call isn't that convenient (like here returning
> 0 or 1 as
> a hack for incrementing num_remove), for instance in hba.c.

Yes, it is not very convenient and not looks pretty, so I'll try a
macro here soon.

> > I think that if we can do it in accurate way and there is no
> > obvious
> > side effects, why not to try it...
> > Changing of pg_stat_statements_reset function result caused a
> > confiderable tests update. Also, I'm not sure that my description
> > of
> > this feature in the docs is blameless..
> > 
> > After all, I'm a little bit in doubt about this feature, so I'm
> > ready
> > to rollback it.
> 
> Well, I personally don't think that I would need it for powa as it's
> designed
> to have very frequent snapshot.  I know you have a different approach
> in
> pg_profile, but I'm not sure it will be that useful for you either?

Of course I can do some workaround if the accurate reset time will be
unavailable. I just want to do the whole thing if it doesn't hurt. If
we have a plenty of timestamps saved now, I think it is a good idea to
have then bound to some milestones. At least it is a pretty equal join
condition between samples.
But if you think we should avoid returning ts here I won't insist on
that.




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
On Sat, 2022-04-02 at 14:11 +0300, Andrei Zubkov wrote:
> On Sat, 2022-04-02 at 18:56 +0800, Julien Rouhaud wrote:
> > Maybe a macro would be better here?  I don't know if that's
> > generally
> > ok or
> > just old and not-that-great code, but there are other places
> > relying
> > on macros
> > when a plain function call isn't that convenient (like here
> > returning
> > 0 or 1 as
> > a hack for incrementing num_remove), for instance in hba.c.
> 
> Yes, it is not very convenient and not looks pretty, so I'll try a
> macro here soon.

Implemented SINGLE_ENTRY_RESET as a macro.
v10 attached
--
regards, Andrei

Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Greg Stark
Date:
The tests for this seem to need adjustments.

[12:41:09.403] test pg_stat_statements ... FAILED 180 ms

diff -U3 /tmp/cirrus-ci-build/contrib/pg_stat_statements/expected/pg_stat_statements.out
/tmp/cirrus-ci-build/contrib/pg_stat_statements/results/pg_stat_statements.out
--- /tmp/cirrus-ci-build/contrib/pg_stat_statements/expected/pg_stat_statements.out
2022-04-02 12:37:42.201823383 +0000
+++ /tmp/cirrus-ci-build/contrib/pg_stat_statements/results/pg_stat_statements.out
2022-04-02 12:41:09.219563204 +0000
@@ -1174,8 +1174,8 @@
 ORDER BY query;
            query           | reset_ts_match
 ---------------------------+----------------
- SELECT $1,$2 AS "STMTTS2" | f
  SELECT $1 AS "STMTTS1"    | t
+ SELECT $1,$2 AS "STMTTS2" | f
 (2 rows)

 -- check that minmax reset does not set stats_reset


Hm. Is this a collation problem?



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Tom Lane
Date:
Greg Stark <stark@mit.edu> writes:
> The tests for this seem to need adjustments.
> [12:41:09.403] test pg_stat_statements ... FAILED 180 ms

> diff -U3 /tmp/cirrus-ci-build/contrib/pg_stat_statements/expected/pg_stat_statements.out
> /tmp/cirrus-ci-build/contrib/pg_stat_statements/results/pg_stat_statements.out
> --- /tmp/cirrus-ci-build/contrib/pg_stat_statements/expected/pg_stat_statements.out
> 2022-04-02 12:37:42.201823383 +0000
> +++ /tmp/cirrus-ci-build/contrib/pg_stat_statements/results/pg_stat_statements.out
> 2022-04-02 12:41:09.219563204 +0000
> @@ -1174,8 +1174,8 @@
>  ORDER BY query;
>             query           | reset_ts_match
>  ---------------------------+----------------
> - SELECT $1,$2 AS "STMTTS2" | f
>   SELECT $1 AS "STMTTS1"    | t
> + SELECT $1,$2 AS "STMTTS2" | f
>  (2 rows)

>  -- check that minmax reset does not set stats_reset

> Hm. Is this a collation problem?

Yeah, looks like it.  ORDER BY query COLLATE "C" might work better.

            regards, tom lane



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Greg,

On Sat, 2022-04-02 at 17:38 -0400, Greg Stark wrote:
> The tests for this seem to need adjustments.
> 
> [12:41:09.403] test pg_stat_statements ... FAILED 180 ms
>             query           | reset_ts_match
>  ---------------------------+----------------
> - SELECT $1,$2 AS "STMTTS2" | f
>   SELECT $1 AS "STMTTS1"    | t
> + SELECT $1,$2 AS "STMTTS2" | f
>  (2 rows)
> 
>  -- check that minmax reset does not set stats_reset
> 
> 
> Hm. Is this a collation problem?

Of course, thank you! I've forgot to set collation here.

v11 attached
--
regards, Andrei

Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
On Sun, Apr 03, 2022 at 07:32:47AM +0300, Andrei Zubkov wrote:
> v11 attached

+       /* When requested reset only min/max statistics of an entry */ \
+       entry_counters = &entry->counters; \
+       for (int kind = 0; kind < PGSS_NUMKIND; kind++) \
+       { \
+           entry_counters->max_time[kind] = 0; \
+           entry_counters->min_time[kind] = 0; \
+       } \
[...]
+static TimestampTz
+entry_reset(Oid userid, Oid dbid, uint64 queryid, bool minmax_only)
 {
    HASH_SEQ_STATUS hash_seq;
    pgssEntry  *entry;
+   Counters   *entry_counters;

Do we really need an extra variable?  Why not simply using
entry->counters.xxx_time[kind]?

Also, I think it's better to make the macro more like function looking, so
SINGLE_ENTRY_RESET().

index f2e822acd3..c2af29866b 100644
--- a/contrib/pg_stat_statements/sql/oldextversions.sql
+++ b/contrib/pg_stat_statements/sql/oldextversions.sql
@@ -36,4 +36,12 @@ AlTER EXTENSION pg_stat_statements UPDATE TO '1.8';
 \d pg_stat_statements
 SELECT pg_get_functiondef('pg_stat_statements_reset'::regproc);

+ALTER EXTENSION pg_stat_statements UPDATE TO '1.9';
+\d pg_stat_statements
+\d pg_stat_statements_info
+SELECT pg_get_functiondef('pg_stat_statements_reset'::regproc);

I don't think this bring any useful coverage.

        Minimum time spent planning the statement, in milliseconds
        (if <varname>pg_stat_statements.track_planning</varname> is enabled,
-       otherwise zero)
+       otherwise zero), this field will contain zero until this statement
+       is planned fist time after reset performed by the
+       <function>pg_stat_statements_reset</function> function with the
+       <structfield>minmax_only</structfield> parameter set to <literal>true</literal>

I think this need some rewording (and s/fist/first).  Maybe:

Minimum time spent planning the statement, in milliseconds.

This field will be zero if <varname>pg_stat_statements.track_planning</varname>
is disabled, or if the counter has been reset using the the
<function>pg_stat_statements_reset</function> function with the
<structfield>minmax_only</structfield> parameter set to <literal>true</literal>
and never been planned since.

       <primary>pg_stat_statements_reset</primary>
      </indexterm>
@@ -589,6 +623,20 @@
       If all statistics in the <filename>pg_stat_statements</filename>
       view are discarded, it will also reset the statistics in the
       <structname>pg_stat_statements_info</structname> view.
+      When <structfield>minmax_only</structfield> is <literal>true</literal> only the
+      values of minimun and maximum execution and planning time will be reset (i.e.

Nitpicking: I would say planning and execution time, as the fields are in this
order in the view/function.



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Julien,

On Sun, 2022-04-03 at 15:07 +0800, Julien Rouhaud wrote:
> +static TimestampTz
> +entry_reset(Oid userid, Oid dbid, uint64 queryid, bool minmax_only)
>  {
>     HASH_SEQ_STATUS hash_seq;
>     pgssEntry  *entry;
> +   Counters   *entry_counters;
> 
> Do we really need an extra variable?  Why not simply using
> entry->counters.xxx_time[kind]?
> 
> Also, I think it's better to make the macro more like function
> looking, so
> SINGLE_ENTRY_RESET().

Agreed with the both, I'll fix it.

> 
> index f2e822acd3..c2af29866b 100644
> --- a/contrib/pg_stat_statements/sql/oldextversions.sql
> +++ b/contrib/pg_stat_statements/sql/oldextversions.sql
> @@ -36,4 +36,12 @@ AlTER EXTENSION pg_stat_statements UPDATE TO
> '1.8';
>  \d pg_stat_statements
>  SELECT pg_get_functiondef('pg_stat_statements_reset'::regproc);
> 
> +ALTER EXTENSION pg_stat_statements UPDATE TO '1.9';
> +\d pg_stat_statements
> +\d pg_stat_statements_info
> +SELECT pg_get_functiondef('pg_stat_statements_reset'::regproc);
> 
> I don't think this bring any useful coverage.

I feel the same, but I've done it like previous tests (versions 1.7 and
1.8). Am I miss something here? Do you think we should remove these
tests completly?

> 
> I think this need some rewording (and s/fist/first).  Maybe:
> 
> Minimum time spent planning the statement, in milliseconds.
> 
> This field will be zero if
> <varname>pg_stat_statements.track_planning</varname>
> is disabled, or if the counter has been reset using the the
> <function>pg_stat_statements_reset</function> function with the
> <structfield>minmax_only</structfield> parameter set to
> <literal>true</literal>
> and never been planned since.

Thanks a lot!

> 
>        <primary>pg_stat_statements_reset</primary>
>       </indexterm>
> @@ -589,6 +623,20 @@
>        If all statistics in the
> <filename>pg_stat_statements</filename>
>        view are discarded, it will also reset the statistics in the
>        <structname>pg_stat_statements_info</structname> view.
> +      When <structfield>minmax_only</structfield> is
> <literal>true</literal> only the
> +      values of minimun and maximum execution and planning time will
> be reset (i.e.
> 
> Nitpicking: I would say planning and execution time, as the fields
> are in this
> order in the view/function.

Agreed.
--
regards, Andrei




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
I've attached v12 of a patch. The only unsolved issue now is the
following:

On Sun, 2022-04-03 at 15:07 +0800, Julien Rouhaud wrote:
> +ALTER EXTENSION pg_stat_statements UPDATE TO '1.9';
> +\d pg_stat_statements
> +\d pg_stat_statements_info
> +SELECT pg_get_functiondef('pg_stat_statements_reset'::regproc);
> 
> I don't think this bring any useful coverage.

It is a little bit unclear to me what is the best solution here.
--
regards, Andrei

Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
Hi,

On Sun, Apr 03, 2022 at 12:29:43PM +0300, Andrei Zubkov wrote:
> I've attached v12 of a patch. The only unsolved issue now is the
> following:
> 
> On Sun, 2022-04-03 at 15:07 +0800, Julien Rouhaud wrote:
> > +ALTER EXTENSION pg_stat_statements UPDATE TO '1.9';
> > +\d pg_stat_statements
> > +\d pg_stat_statements_info
> > +SELECT pg_get_functiondef('pg_stat_statements_reset'::regproc);
> > 
> > I don't think this bring any useful coverage.
> 
> It is a little bit unclear to me what is the best solution here.

Sorry, I missed that there were some similar tests already for previous
versions.  This was probably discussed and agreed before, so +1 to be
consistent with the new versions.

The patch looks good to me, although I will do a full review to make sure I
didn't miss anything.

Just another minor nitpicking after a quick look:

+ This field will be zero if ...
[...]
+ this field will contain zero until this statement ...

The wording should be consistent, so either "will be zero" or "will contain
zero" everywhere.  I'm personally fine with any, but maybe a native English
will think one is better.



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Julien,

On Sun, 2022-04-03 at 17:56 +0800, Julien Rouhaud wrote:
> Just another minor nitpicking after a quick look:
> 
> + This field will be zero if ...
> [...]
> + this field will contain zero until this statement ...
> 
> The wording should be consistent, so either "will be zero" or "will
> contain
> zero" everywhere.  I'm personally fine with any, but maybe a native
> English
> will think one is better.
Agreed.

Searching the docs I've fond out that "will contain" usually used with
the description of contained structure rather then a simple value. So
I'll use a "will be zero" in the next version after your review.
--
regards, Andrei




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
Hi,

On Sun, Apr 03, 2022 at 01:24:40PM +0300, Andrei Zubkov wrote:
>
> On Sun, 2022-04-03 at 17:56 +0800, Julien Rouhaud wrote:
> > Just another minor nitpicking after a quick look:
> >
> > + This field will be zero if ...
> > [...]
> > + this field will contain zero until this statement ...
> >
> > The wording should be consistent, so either "will be zero" or "will
> > contain
> > zero" everywhere.  I'm personally fine with any, but maybe a native
> > English
> > will think one is better.
> Agreed.
>
> Searching the docs I've fond out that "will contain" usually used with
> the description of contained structure rather then a simple value. So
> I'll use a "will be zero" in the next version after your review.

Ok!

So last round of review.

- the commit message:

It should probably mention the mimnax_stats_since at the beginning.  Also, both
the view and the function contain those new field.

Minor rephrasing:

s/evicted and returned back/evicted and stored again/?
s/with except of all/with the exception of all/
s/is now returns/now returns/

- code:

+#define SINGLE_ENTRY_RESET() \
+if (entry) { \
[...]

It's not great to rely on caller context too much.  I think it would be better
to pass at least the entry as a parameter (maybe e?) to the macro for more
clarity.  I'm fine with keeping minmax_only, stats_reset and num_remove as is.

Apart from that I think this is ready!



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Julien,

Thank you very much for your work on this patch!

On Mon, 2022-04-04 at 10:31 +0800, Julien Rouhaud wrote:
> - the commit message:
> 
> It should probably mention the mimnax_stats_since at the beginning. 
> Also, both
> the view and the function contain those new field.
> 
> Minor rephrasing:
> 
> s/evicted and returned back/evicted and stored again/?
> s/with except of all/with the exception of all/
> s/is now returns/now returns/

Agreed, commit message updated.

> - code:
> 
> +#define SINGLE_ENTRY_RESET() \
> +if (entry) { \
> [...]
> 
> It's not great to rely on caller context too much.

Yes, I was thinking about it. But using 4 parameters seemed strange to
me.

>   I think it would be better
> to pass at least the entry as a parameter (maybe e?) to the macro for
> more
> clarity.  I'm fine with keeping minmax_only, stats_reset and
> num_remove as is.

Using an entry as a macro parameter looks good, I'm fine with "e". 

> Apart from that I think this is ready!

v13 attached
--
regards, Andrei

Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
Hi,

On Mon, Apr 04, 2022 at 09:59:04AM +0300, Andrei Zubkov wrote:
> > Minor rephrasing:
> >
> > s/evicted and returned back/evicted and stored again/?
> > s/with except of all/with the exception of all/
> > s/is now returns/now returns/
>
> Agreed, commit message updated.
>
> > - code:
> >
> > +#define SINGLE_ENTRY_RESET() \
> > +if (entry) { \
> > [...]
> >
> > It's not great to rely on caller context too much.
>
> Yes, I was thinking about it. But using 4 parameters seemed strange to
> me.
>
> >   I think it would be better
> > to pass at least the entry as a parameter (maybe e?) to the macro for
> > more
> > clarity.  I'm fine with keeping minmax_only, stats_reset and
> > num_remove as is.
>
> Using an entry as a macro parameter looks good, I'm fine with "e".
>
> > Apart from that I think this is ready!
>
> v13 attached

Thanks a lot!  I'm happy with this version, so I'm marking it as Ready for
Committer.



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi,

I've rebased this patch so that it can be applied after 57d6aea00fc.

v14 attached
--
regards, Andrei

Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Tomas Vondra
Date:
Hi,

I took a quick look at this patch, to see if there's something we
want/can get into v16. The last version was submitted about 9 months
ago, and it doesn't apply cleanly anymore, but the bitrot is fairly
minor. Not sure there's still interest, though.

As for the patch, I wonder if it's unnecessarily complex. It adds *two*
timestamps for each pg_stat_statements entry - one for reset of the
whole entry, one for reset of "min/max" times only.

I can see why the first timestamp (essentially tracking creating of the
entry) is useful. I'd probably call it "created_at" or something like
that, but that's a minor detail. Or maybe stats_reset, which is what we
use in pgstat?

But is the second timestamp for the min/max fields really useful? AFAIK
to perform analysis, people take regular pg_stat_statements snapshots,
which works fine for counters (calculating deltas) but not for gauges
(which need a reset, to track fresh values). But people analyzing this
are already resetting the whole entry, and so the snapshots already are
tracking deltas.

So I'm not convinced actually need the second timestamp.

A couple more comments:

1) I'm not sure why the patch is adding tests of permissions on the
pg_stat_statements_reset function?

2) If we want the second timestamp, shouldn't it also cover resets of
the mean values, not just min/max?

3) I don't understand why the patch is adding "IS NOT NULL AS t" to
various places in the regression tests.

4) I rather dislike the "minmax" naming, because that's often used in
other contexts (for BRIN indexes), and as I mentioned maybe it should
also cover the "mean" fields.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Tomas,

On Wed, 2023-01-18 at 17:29 +0100, Tomas Vondra wrote:
> I took a quick look at this patch, to see if there's something we
> want/can get into v16. The last version was submitted about 9 months
> ago, and it doesn't apply cleanly anymore, but the bitrot is fairly
> minor. Not sure there's still interest, though.

Thank you for your attention to this patch!

I'm still very interest in this patch. And I think I'll try to rebase
this patch during a week or two if it seems possible to get it in 16..
> 
> I'd probably call it "created_at" or something like
> that, but that's a minor detail. Or maybe stats_reset, which is what
> we
> use in pgstat?

Yes there is some naming issue. My thought was the following:
 - "stats_reset" is not quite correct here, because the statement entry
moment if definitely not a reset. The field named just as it means -
this is time of the moment from which statistics is collected for this
particular entry.
 - "created_at" perfectly matches the purpose of the field, but seems
not such self-explaining to me.

> 
> But is the second timestamp for the min/max fields really useful?
> AFAIK
> to perform analysis, people take regular pg_stat_statements
> snapshots,
> which works fine for counters (calculating deltas) but not for gauges
> (which need a reset, to track fresh values). But people analyzing
> this
> are already resetting the whole entry, and so the snapshots already
> are
> tracking deltas.
>
> So I'm not convinced actually need the second timestamp.

The main purpose of the patch is to provide means to collecting
solutions to avoid the reset of pgss at all. Just like it happens for
the pg_stat_ views. The only really need of reset is that we can't be
sure that observing statement was not evicted and come back since last
sample. Right now we only can do a whole reset on each sample and see
how many entries will be in pgss hashtable on the next sample - how
close this value to the max. If there is a plenty space in hashtable we
can hope that there was not evictions since last sample. However there
could be reset performed by someone else and we are know nothing about
this.
Having a timestamp in stats_since field we are sure about how long this
statement statistics is tracked. That said sampling solution can
totally avoid pgss resets. Avoiding such resets means avoiding
interference between monitoring solutions.
But if no more resets is done we can't track min/max values, because
they still needs a reset and we can do nothing with such resets - they
are necessary. However I still want to know when min/max reset was
performed. This will help to detect possible interference on such
resets.
> 
> 
> A couple more comments:
> 
> 1) I'm not sure why the patch is adding tests of permissions on the
> pg_stat_statements_reset function?
> 
> 2) If we want the second timestamp, shouldn't it also cover resets of
> the mean values, not just min/max?

I think that mean values shouldn't be target for a partial reset
because the value for mean values can be easily reconstructed by the
sampling solution without a reset.

> 
> 3) I don't understand why the patch is adding "IS NOT NULL AS t" to
> various places in the regression tests.

The most of tests was copied from the previous version. I'll recheck
them.

> 
> 4) I rather dislike the "minmax" naming, because that's often used in
> other contexts (for BRIN indexes), and as I mentioned maybe it should
> also cover the "mean" fields.

Agreed, but I couldn't make it better. Other versions seemed worse to
me...
> 
> 
Regards, Andrei Zubkov




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi,

I've updated this patch for the current master. Also I have some
additional explanations..

On Wed, 2023-01-18 at 17:29 +0100, Tomas Vondra wrote:
> 1) I'm not sure why the patch is adding tests of permissions on the
> pg_stat_statements_reset function?

I've fixed that

> 
> 2) If we want the second timestamp, shouldn't it also cover resets of
> the mean values, not just min/max?

I think that mean values are not a targets for auxiliary resets because
any sampling solution can easily calculate the mean values between
samples without a reset.

> 
> 3) I don't understand why the patch is adding "IS NOT NULL AS t" to
> various places in the regression tests.

This change is necessary in the current version because the
pg_stat_statements_reset() function will return a timestamp of a reset,
needed for sampling solutions to detect resets, perfermed by someone
else.


Regards
-- 
Andrei Zubkov


Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi,

The final version of this patch should fix meson build and tests.
-- 
Andrei Zubkov


Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi!

I've attached a new version of a patch - rebase to the current master.
-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
"Gregory Stark (as CFM)"
Date:
On Wed, 1 Mar 2023 at 04:04, Andrei Zubkov <zubkov@moonset.ru> wrote:
>
> Hi!
>
> I've attached a new version of a patch - rebase to the current master.

The CFBot (http://cfbot.cputube.org/) doesn't seem to like this. It
looks like all the Meson builds are failing, perhaps there's something
particular about the test environment that is either not set up right
or is exposing a bug?

Please check if this is a real failure or a cfbot failure.


-- 
Gregory Stark
As Commitfest Manager



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Gregory,

On Wed, 2023-03-01 at 14:24 -0500, Gregory Stark (as CFM) wrote:
> The CFBot (http://cfbot.cputube.org/) doesn't seem to like this. It
> looks like all the Meson builds are failing, perhaps there's
> something
> particular about the test environment that is either not set up right
> or is exposing a bug?

Thank you, I've missed it.
> 
> Please check if this is a real failure or a cfbot failure.
> 
It is my failure. Just forgot to update meson.build
I think CFBot should be happy now.

Regards,
-- 
Andrei Zubkov


Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
"Gregory Stark (as CFM)"
Date:
I'm sorry, It seems this is failing again? It's Makefile and
meson.build again :(

Though there are also

patching file contrib/pg_stat_statements/sql/oldextversions.sql
can't find file to patch at input line 1833


and

|--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
|+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
--------------------------
No file to patch.  Skipping patch.




--
Gregory Stark
As Commitfest Manager



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Gregory,

> patching file contrib/pg_stat_statements/sql/oldextversions.sql
> can't find file to patch at input line 1833
> 
> 
> and
> 
> > --- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
> > +++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
> --------------------------
> No file to patch.  Skipping patch.
> 
Thank you for your attention.

Yes, it is due to parallel work on "Normalization of utility queries in
pg_stat_statements" patch
(https://postgr.es/m/Y/7Y9U/y/keAW3qH@paquier.xyz)

It seems I've found something strange in new test files - I've
mentioned this in a thread of a patch. Once there will be any solution
I'll do a rebase again as soon as possible.

-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi,

I've done a rebase of a patch to the current master.

-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Michael Paquier
Date:
On Sat, Mar 11, 2023 at 02:49:50PM +0300, Andrei Zubkov wrote:
> Hi,
>
> I've done a rebase of a patch to the current master.

+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+ALTER EXTENSION pg_stat_statements DROP FUNCTION
+  pg_stat_statements_reset(Oid, Oid, bigint);

The upgrade script of an extension is launched by the backend in the
context of an extension, so these three queries should not be needed,
as far as I recall.

-SELECT pg_stat_statements_reset();
- pg_stat_statements_reset
---------------------------
-
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
 (1 row)

Wouldn't it be better to do this kind of refactoring in its own patch
to make the follow-up changes more readable?  This function is changed
to return a timestamp rather than void, but IS NOT NULL applied on the
existing queries would also return true.  This would clean up quite a
few diffs in the main patch..
--
Michael

Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Michael,

Thank you for your attention.

On Thu, 2023-03-16 at 16:13 +0900, Michael Paquier wrote:
> +/* First we have to remove them from the extension */
> +ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
> +ALTER EXTENSION pg_stat_statements DROP FUNCTION
> pg_stat_statements(boolean);
> +ALTER EXTENSION pg_stat_statements DROP FUNCTION
> +  pg_stat_statements_reset(Oid, Oid, bigint);
> 
> The upgrade script of an extension is launched by the backend in the
> context of an extension, so these three queries should not be needed,
> as far as I recall.

Agreed. I've done it as it was in previous versions. But I'm sure those
are unnecessary.

> Wouldn't it be better to do this kind of refactoring in its own patch
> to make the follow-up changes more readable?  This function is
> changed
> to return a timestamp rather than void, but IS NOT NULL applied on
> the
> existing queries would also return true.  This would clean up quite a
> few diffs in the main patch..
Splitting this commit seems reasonable to me.

New version is attached.

Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Sergei Kornilov
Date:
Hello!

The documentation still describes the function pg_stat_statements_reset like this

>       By default, this function can only be executed by superusers.

But unfortunately, this part was lost somewhere.

-- Don't want this to be available to non-superusers.
REVOKE ALL ON FUNCTION pg_stat_statements_reset(Oid, Oid, bigint, boolean) FROM PUBLIC;

should be added to the upgrade script

Also, shouldn't we first do:

/* First we have to remove them from the extension */
ALTER EXTENSION pg_stat_statements DROP VIEW ..
ALTER EXTENSION pg_stat_statements DROP FUNCTION ..

like in previous upgrade scripts?

> +       Time at which min/max statistics gathering started for this
> +       statement

I think it would be better to explicitly mention in the documentation all 4 fields for which minmax_stats_since
displaysthe time.
 

regards, Sergei



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Sergei!

Thank you for your review.

On Tue, 2023-03-21 at 23:18 +0300, Sergei Kornilov wrote:
> -- Don't want this to be available to non-superusers.
> REVOKE ALL ON FUNCTION pg_stat_statements_reset(Oid, Oid, bigint,
> boolean) FROM PUBLIC;
> 
> should be added to the upgrade script

Indeed.

> Also, shouldn't we first do:
> 
> /* First we have to remove them from the extension */
> ALTER EXTENSION pg_stat_statements DROP VIEW ..
> ALTER EXTENSION pg_stat_statements DROP FUNCTION ..
> 
> like in previous upgrade scripts?

It was discussed few messages earlier in this thread. We've decided
that those are unnecessary in upgrade script.

> > +       Time at which min/max statistics gathering started for this
> > +       statement
> 
> I think it would be better to explicitly mention in the documentation
> all 4 fields for which minmax_stats_since displays the time.

Agreed.

New version is attached.

regards, Andrei

Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Daniel Gustafsson
Date:
> On 22 Mar 2023, at 09:17, Andrei Zubkov <zubkov@moonset.ru> wrote:

> New version is attached.

This patch is marked RfC but didn't get reviewed/committed during this CF so
I'm moving it to the next, the patch no longer applies though so please submit
an updated version.

--
Daniel Gustafsson




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi hackers,

New version 23 attached. It contains rebase to the current master.
Noted that v1.11 adds new fields to the pg_stat_sstatements view, but
leaves the PGSS_FILE_HEADER constant unchanged. It this correct?

--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi,

During last moving to the current commitfest this patch have lost its
reviewers list. With respect to reviewers contribution in this patch, I
think reviewers list should be fixed.

regards,

Andrei Zubkov
Postgres Professional
The Russian Postgres Company




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Peter Eisentraut
Date:
On 24.10.23 09:58, Andrei Zubkov wrote:
> During last moving to the current commitfest this patch have lost its
> reviewers list. With respect to reviewers contribution in this patch, I
> think reviewers list should be fixed.

I don't think it's the purpose of the commitfest app to track how *has* 
reviewed a patch.  The purpose is to plan and allocate *current* work. 
If we keep a bunch of reviewers listed on a patch who are not actually 
reviewing the patch, then that effectively blocks new reviewers from 
signing up and the patch will not make progress.

Past reviewers should of course be listed in the commit message, the 
release notes, etc. as appropriate.




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
On Tue, Oct 24, 2023 at 6:57 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 24.10.23 09:58, Andrei Zubkov wrote:
> > During last moving to the current commitfest this patch have lost its
> > reviewers list. With respect to reviewers contribution in this patch, I
> > think reviewers list should be fixed.
>
> I don't think it's the purpose of the commitfest app to track how *has*
> reviewed a patch.  The purpose is to plan and allocate *current* work.
> If we keep a bunch of reviewers listed on a patch who are not actually
> reviewing the patch, then that effectively blocks new reviewers from
> signing up and the patch will not make progress.
>
> Past reviewers should of course be listed in the commit message, the
> release notes, etc. as appropriate.

Really?  Last time this topic showed up at least one committer said
that they tend to believe the CF app more than digging the thread [1],
and some other hackers mentioned other usage for being kept in the
reviewer list.  Since no progress has been made on the CF app since
I'm not sure it's the best idea to drop reviewer names from patch
entries generally.

[1] https://www.postgresql.org/message-id/552155.1648737431@sss.pgh.pa.us



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Peter Eisentraut
Date:
On 24.10.23 14:40, Julien Rouhaud wrote:
> On Tue, Oct 24, 2023 at 6:57 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> On 24.10.23 09:58, Andrei Zubkov wrote:
>>> During last moving to the current commitfest this patch have lost its
>>> reviewers list. With respect to reviewers contribution in this patch, I
>>> think reviewers list should be fixed.
>>
>> I don't think it's the purpose of the commitfest app to track how *has*
>> reviewed a patch.  The purpose is to plan and allocate *current* work.
>> If we keep a bunch of reviewers listed on a patch who are not actually
>> reviewing the patch, then that effectively blocks new reviewers from
>> signing up and the patch will not make progress.
>>
>> Past reviewers should of course be listed in the commit message, the
>> release notes, etc. as appropriate.
> 
> Really?  Last time this topic showed up at least one committer said
> that they tend to believe the CF app more than digging the thread [1],
> and some other hackers mentioned other usage for being kept in the
> reviewer list.  Since no progress has been made on the CF app since
> I'm not sure it's the best idea to drop reviewer names from patch
> entries generally.

There is a conflict between the two purposes.  But it is clearly the 
case that reviewers will more likely pick up patches that have no 
reviewers assigned.  So if you keep stale reviewer entries around, then 
a patch that stays around for a while will never get reviewed again.  I 
think this is a significant problem at the moment, and I made it part of 
my mission during the last commitfest to clean it up.  If people want to 
put the stale reviewer entries back in, that is possible, but I would 
caution against that, because that would just self-sabotage those patches.




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Lepikhov
Date:
On 19/10/2023 19:40, Andrei Zubkov wrote:
> Hi hackers,
> 
> New version 23 attached. It contains rebase to the current master.

I discovered the patch and parameters you've proposed. In my opinion, 
the stats_since parameter adds valuable information and should 
definitely be included in the stats data because the statement can be 
noteless removed from the list and inserted again.
But minmax_stats_since and changes in the UI of the reset routine look 
like syntactic sugar here.
I can't convince myself that it is really needed. Do you have some set 
of cases that can enforce the changes proposed? Maybe we should 
intensively work on adding the 'stats_since' parameter and continue the 
discussion of the necessity of another one?

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Andrei,

On Wed, 2023-10-25 at 13:59 +0700, Andrei Lepikhov wrote:
> But minmax_stats_since and changes in the UI of the reset routine
> look like syntactic sugar here.
> I can't convince myself that it is really needed. Do you have some
> set of cases that can enforce the changes proposed?

Yes, it looks strange, but it is needed in some way.
The main purpose of this patch is to provide means for sampling
solutions for collecting statistics data in series of samples. The
first goal here - is to be sure that the statement was not evicted and
come back between samples (especially between rare samples). This is
the target of the stats_since field. The second goal - is the ability
of getting all statistic increments for the interval between samples.
All statistics provided by pg_stat_statements with except of min/max
values can be calculated for the interval since the last sample knowing
the values in the last and current samples. We need a way to get
min/max values too. This is achieved by min/max reset performed by the
sampling solution. The minmax_stats_since field is here for the same
purpose - the sampling solution is need to be sure that no one have
done a min/max reset between samples. And if such reset was performed
at least we know when it was performed.

regards,
Andrei Zubkov
Postgres Professional



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Alena Rybakina
Date:
On 19.10.2023 15:40, Andrei Zubkov wrote:
Hi hackers,

New version 23 attached. It contains rebase to the current master.
Noted that v1.11 adds new fields to the pg_stat_sstatements view, but
leaves the PGSS_FILE_HEADER constant unchanged. It this correct?
Hi! Thank you for your work on the subject.

1. I didn't understand why we first try to find pgssEntry with a false top_level value, and later find it with a true top_level value./*
 * Remove the entry if it exists, starting with the non-top-level entry.
 */
key.toplevel = false;
entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_FIND, NULL);

SINGLE_ENTRY_RESET(entry);

/* Also reset the top-level entry if it exists. */
key.toplevel = true;

entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_FIND, NULL);

SINGLE_ENTRY_RESET(entry);

I looked through this topic and found some explanation in this email [0], but I didn't understand it. Can you explain it to me?

2. And honestly, I think you should change
"Remove the entry if it exists, starting with the non-top-level entry." on
"Remove the entry or reset min/max time statistic information and the timestamp if it exists, starting with the non-top-level entry."

And the same with the comment bellow:

"Also reset the top-level entry if it exists."
"Also remove the entry or reset min/max time statistic information and the timestamp if it exists." In my opinion, this is necessary because the minmax_only parameter is set by the user, so both ways are possible.


0 - https://www.postgresql.org/message-id/62d16845-e74e-a6f9-9661-022e44f48922%40inbox.ru

-- 
Regards,
Alena Rybakina

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Alena,

On Wed, 2023-10-25 at 16:25 +0300, Alena Rybakina wrote:
>  Hi! Thank you for your work on the subject.
> 1. I didn't understand why we first try to find pgssEntry with a
> false top_level value, and later find it with a true top_level value.

In case of pg_stat_statements the top_level field is the bool field of
the pgssHashKey struct used as the key for pgss_hash hashtable. When we
are performing a reset only userid, dbid and queryid values are
provided. We need to reset both top-level and non-top level entries. We
have only one way to get them all from a hashtable - search for entries
having top_level=true and with top_level=false in their keys.

> 2. And honestly, I think you should change
>  "Remove the entry if it exists, starting with the non-top-level
> entry." on
>  "Remove the entry or reset min/max time statistic information and
> the timestamp if it exists, starting with the non-top-level entry."
> And the same with the comment bellow:
> "Also reset the top-level entry if it exists."
>  "Also remove the entry or reset min/max time statistic information
> and the timestamp if it exists."

There are four such places actually - every time when the
SINGLE_ENTRY_RESET macro is used. The logic of reset performed is
described a bit in this macro definition. It seems quite redundant to
repeat this description four times. But I've noted that "remove" terms
should be replaced by "reset".

I've attached v24 with commits updated.

regards, Andrei Zubkov
Postgres Professional


Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Lepikhov
Date:
On 25/10/2023 20:00, Andrei Zubkov wrote:
> Hi Andrei,
> 
> On Wed, 2023-10-25 at 13:59 +0700, Andrei Lepikhov wrote:
>> But minmax_stats_since and changes in the UI of the reset routine
>> look like syntactic sugar here.
>> I can't convince myself that it is really needed. Do you have some
>> set of cases that can enforce the changes proposed?
> 
> Yes, it looks strange, but it is needed in some way.
> The main purpose of this patch is to provide means for sampling
> solutions for collecting statistics data in series of samples. The
> first goal here - is to be sure that the statement was not evicted and
> come back between samples (especially between rare samples). This is
> the target of the stats_since field. The second goal - is the ability
> of getting all statistic increments for the interval between samples.
> All statistics provided by pg_stat_statements with except of min/max
> values can be calculated for the interval since the last sample knowing
> the values in the last and current samples. We need a way to get
> min/max values too. This is achieved by min/max reset performed by the
> sampling solution. The minmax_stats_since field is here for the same
> purpose - the sampling solution is need to be sure that no one have
> done a min/max reset between samples. And if such reset was performed
> at least we know when it was performed.

It is the gist of my question. If needed, You can remove the record by 
(userid, dbOid, queryId). As I understand, this extension is usually 
used by an administrator. Who can reset these parameters except you and 
the DBMS?

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
On Thu, 2023-10-26 at 15:49 +0700, Andrei Lepikhov wrote:
> It is the gist of my question. If needed, You can remove the record
> by
> (userid, dbOid, queryId). As I understand, this extension is usually
> used by an administrator. Who can reset these parameters except you
> and
> the DBMS?
This extension is used by administrator but indirectly through some
kind of sampling solution providing information about statistics change
over time. The only kind of statistics unavailable to sampling
solutions without a periodic reset is a min/max statistics. This patch
provides a way for resetting those statistics without entry eviction.
Suppose the DBA will use several sampling solutions. Every such
solution can perform its own resets of min/max statistics. Other
sampling solutions need a way to detect such resets to avoid undetected
interference. Timestamping of min/max reset can be used for that
purpose.

--
regards, Andrei Zubkov
Postgres Professional



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Alena Rybakina
Date:
On 25.10.2023 18:35, Andrei Zubkov wrote:
Hi Alena,

On Wed, 2023-10-25 at 16:25 +0300, Alena Rybakina wrote:
 Hi! Thank you for your work on the subject.
1. I didn't understand why we first try to find pgssEntry with a
false top_level value, and later find it with a true top_level value.
In case of pg_stat_statements the top_level field is the bool field of
the pgssHashKey struct used as the key for pgss_hash hashtable. When we
are performing a reset only userid, dbid and queryid values are
provided. We need to reset both top-level and non-top level entries. We
have only one way to get them all from a hashtable - search for entries
having top_level=true and with top_level=false in their keys.
Thank you for explanation, I got it.
2. And honestly, I think you should change 
 "Remove the entry if it exists, starting with the non-top-level
entry." on 
 "Remove the entry or reset min/max time statistic information and
the timestamp if it exists, starting with the non-top-level entry."
And the same with the comment bellow:
"Also reset the top-level entry if it exists."
 "Also remove the entry or reset min/max time statistic information
and the timestamp if it exists."
There are four such places actually - every time when the
SINGLE_ENTRY_RESET macro is used. The logic of reset performed is
described a bit in this macro definition. It seems quite redundant to
repeat this description four times. But I've noted that "remove" terms
should be replaced by "reset".

I've attached v24 with commits updated.
Yes, I agree with the changes.
-- 
Regards,
Alena Rybakina

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi hackers,

Patch rebased to the current master
--
regards, Andrei Zubkov
Postgres Professional

Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
A little fix in "level_tracking" tests after merge.

--
regards, Andrei Zubkov
Postgres Professional



Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Alexander Korotkov
Date:
Hi!

On Fri, Nov 17, 2023 at 10:40 AM Andrei Zubkov <zubkov@moonset.ru> wrote:
>
> A little fix in "level_tracking" tests after merge.

I've reviewed this patch. I think this is the feature of high demand.
New columns (stats_since and minmax_stats_since) to the
pg_stat_statements view, enhancing the granularity and precision of
performance monitoring. This addition allows database administrators
to have a clearer understanding of the time intervals for statistics
collection on each statement. Such detailed tracking is invaluable for
performance tuning and identifying bottlenecks in database operations.

I think the design was well-discussed in this thread.  Implementation
also looks good to me.  I've just slightly revised the commit
messages.

I'd going to push this patchset if no objections.

------
Regards,
Alexander Korotkov

Attachment

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Julien Rouhaud
Date:
Hi,

On Sat, Nov 25, 2023 at 02:45:07AM +0200, Alexander Korotkov wrote:
>
> I've reviewed this patch. I think this is the feature of high demand.
> New columns (stats_since and minmax_stats_since) to the
> pg_stat_statements view, enhancing the granularity and precision of
> performance monitoring. This addition allows database administrators
> to have a clearer understanding of the time intervals for statistics
> collection on each statement. Such detailed tracking is invaluable for
> performance tuning and identifying bottlenecks in database operations.

Yes, it will greatly improve performance analysis tools, and as the maintainer
of one of them I've been waiting for this feature for a very long time!
>
> I think the design was well-discussed in this thread.  Implementation
> also looks good to me.  I've just slightly revised the commit
> messages.
>
> I'd going to push this patchset if no objections.

Thanks!  No objection from me, it all looks good.



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Andrei Zubkov
Date:
Hi Alexander!

On Sat, 2023-11-25 at 02:45 +0200, Alexander Korotkov wrote:

> I've reviewed this patch.

Thank you very much for your review.

> I think the design was well-discussed in this thread.  Implementation
> also looks good to me.  I've just slightly revised the commit
> messages.

I've noted a strange space in a commit message of 0001 patch:
"I t is needed for upcoming patch..."
It looks like a typo.

--
regards, Andrei Zubkov
Postgres Professional




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Alexander Korotkov
Date:
On Sat, Nov 25, 2023 at 10:45 PM Andrei Zubkov <zubkov@moonset.ru> wrote:
> On Sat, 2023-11-25 at 02:45 +0200, Alexander Korotkov wrote:
>
> > I've reviewed this patch.
>
> Thank you very much for your review.
>
> > I think the design was well-discussed in this thread.  Implementation
> > also looks good to me.  I've just slightly revised the commit
> > messages.
>
> I've noted a strange space in a commit message of 0001 patch:
> "I t is needed for upcoming patch..."
> It looks like a typo.

Thank you for catching it.  I'll fix this before commit.

------
Regards,
Alexander Korotkov



Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

From
Alexander Lakhin
Date:
Hello Alexander,

25.11.2023 23:46, Alexander Korotkov wrote:
> On Sat, Nov 25, 2023 at 10:45 PM Andrei Zubkov<zubkov@moonset.ru>  wrote:
>>
>> I've noted a strange space in a commit message of 0001 patch:
>> "I t is needed for upcoming patch..."
>> It looks like a typo.
> Thank you for catching it.  I'll fix this before commit.

I've found one more typo in that commit: minimun.

Best regards,
Alexander