Thread: [PATCH] Tracking statements entry timestamp in pg_stat_statements
This is a proposal for a new feature in pg_stat_statements extension. As a statistical extension providing counters pg_stat_statements extension is a target for various sampling solutions. All of them interested in calculation of statement statistics increments between two samples. But we face a problem here - observing one statement with its statistics right now we can't be sure that statistics increment for this statement is continuous from previous sample. This statement could be deallocated after previous sample and come back soon. Also it could happen that statement executions after that increased statistics to above the values we observed in previous sample making it impossible to detect deallocation on statement level. My proposition here is to store statement entry timestamp. In this case any sampling solution in case of returning statement will easily detect it by changed timestamp value. And for every statement we will know exact time interval for its statistics.
Attachment
RE: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Dear Andrei, I think the idea is good because the pg_stat_statements_info view cannot distinguish whether the specific statement is deallocated or not. But multiple calling of GetCurrentTimestamp() may cause some performance issues. How about adding a configuration parameter for controlling this feature? Or we don't not have to worry about that? > + if (api_version >= PGSS_V1_9) > + { > + values[i++] = TimestampTzGetDatum(first_seen); > + } I think {} is not needed here. Best Regards, Hayato Kuroda FUJITSU LIMITED
Hi Kuroda, Thank you for your attention! On Tue, 2021-03-23 at 02:13 +0000, kuroda.hayato@fujitsu.com wrote: > But multiple calling of GetCurrentTimestamp() may cause some > performance issues. > How about adding a configuration parameter for controlling this > feature? > Or we don't not have to worry about that? Certaily I was thinking about this. And I've taken an advice of Teodor Sigaev - a much more expirienced developer than me. It seems that GetCurrentTimestamp() is fast enough for our purpose and we won't call it too often - only on new statement entry allocation. By the way right now in my workload tracing tool pg_profile I have to reset pg_stat_statements on every sample (wich is about 30-60 minutes) to make sure that all workload between samples is captured. This causes much more overhead. Introduced first_seen column can eliminate the need of resets. However, there is another way - we can store the curent value of pg_stat_statements_info.dealloc field when allocating a new statement entry instead of timstamping it. Probably, it would be little faster, but timestamp seems much more valuable here. > > > + if (api_version >= PGSS_V1_9) > > + { > > + values[i++] = TimestampTzGetDatum(first_seen); > > + } > > I think {} is not needed here. Of course, thank you! -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Mar 23, 2021 at 09:50:16AM +0300, Andrei Zubkov wrote: > > By the way right now in my workload tracing tool pg_profile I have to > reset pg_stat_statements on every sample (wich is about 30-60 minutes) > to make sure that all workload between samples is captured. This causes > much more overhead. Introduced first_seen column can eliminate the need > of resets. Note that you could also detect entries for which some counters decreased (e.g. the execution count), and in that case only use the current values. It should give the exact same results as what you will get with the first_seen column, except of course if some entry is almost never used and is suddenly used a lot after an explicit reset or an eviction and only until you perform your snapshot. I'm not sure that it's a very likely scenario though. FTR that's how powa currently deals with reset/eviction.
Hi Julien, On Tue, 2021-03-23 at 15:03 +0800, Julien Rouhaud wrote: > Note that you could also detect entries for which some counters > decreased (e.g. > the execution count), and in that case only use the current values. Yes, but checking condition for several counters seems complicated compared to check only one field. > It should > give the exact same results as what you will get with the first_seen > column, > except of course if some entry is almost never used and is suddenly > used a lot > after an explicit reset or an eviction and only until you perform > your > snapshot. I'm not sure that it's a very likely scenario though. But it is possible, and we are guessing here. Storing a timestamp does not seems too expensive to me, but it totally eliminates guessing, and provides a clear view about the time interval we watching for this specific statement. > FTR that's how powa currently deals with reset/eviction. PoWA sampling is much more frequent than pg_profile. For PoWA it is, of cource, very unlikely scenario, but still possible. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
RE: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Dear Andrei, > Certaily I was thinking about this. And I've taken an advice of Teodor > Sigaev - a much more expirienced developer than me. It seems that > GetCurrentTimestamp() is fast enough for our purpose and we won't call > it too often - only on new statement entry allocation. OK. > However, there is another way - we can store the curent value > of pg_stat_statements_info.dealloc field when allocating a new > statement entry instead of timstamping it. Probably, it would be little > faster, but timestamp seems much more valuable here. I don't like the idea because such a column has no meaning for the specific row. I prefer storing timestamp if GetCurrentTimestamp() is cheap. Best Regards, Hayato Kuroda FUJITSU LIMITED
Dear Kuroda, > I don't like the idea because such a column has no meaning for the > specific row. > I prefer storing timestamp if GetCurrentTimestamp() is cheap. I agree. New version attached. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
2021-03-23 23:08 に Andrei Zubkov さんは書きました: > Dear Kuroda, > >> I don't like the idea because such a column has no meaning for the >> specific row. >> I prefer storing timestamp if GetCurrentTimestamp() is cheap. > I agree. New version attached. Thanks for posting the patch. I agree with this content. Is it necessary to update the version of pg_stat_statements now that the release is targeted for PG15? We take into account the risk that users will misunderstand. Regards, -- Yuki Seino Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Wed, 2021-04-07 at 17:26 +0900, Seino Yuki wrote: > Is it necessary to update the version of pg_stat_statements now that > the > release is targeted for PG15? I think, yes, version of pg_stat_statements is need to be updated. Is it will be 1.10 in PG15? Regards, -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
RE: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Dear Andrei, > I think, yes, version of pg_stat_statements is need to be updated. Is > it will be 1.10 in PG15? I think you are right. According to [1] we can bump up the version per one PG major version, and any features are not committed yet for 15. [1]: https://www.postgresql.org/message-id/20201202040516.GA43757@nol Best Regards, Hayato Kuroda FUJITSU LIMITED
Hi, Kuroda! I've intended to change the pg_stat_statements version with rebasing this patch to the current master branch state. Now this is commit 07e5e66. But I'm unable to test the patch - it seems that pg_stat_statements is receiving queryId = 0 for every statements in every hook now and statements are not tracked at all. Am I mistaken somewhere? Maybe you know why this is happening? Thank you! -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company On Fri, 2021-04-09 at 00:23 +0000, kuroda.hayato@fujitsu.com wrote: > Dear Andrei, > > > I think, yes, version of pg_stat_statements is need to be updated. > > Is > > it will be 1.10 in PG15? > > I think you are right. > According to [1] we can bump up the version per one PG major version, > and any features are not committed yet for 15. > > [1]: https://www.postgresql.org/message-id/20201202040516.GA43757@nol > > > Best Regards, > Hayato Kuroda > FUJITSU LIMITED >
But I'm unable to test the patch - it seems that pg_stat_statements is
receiving queryId = 0 for every statements in every hook now and
statements are not tracked at all.
Am I mistaken somewhere? Maybe you know why this is happening?
did you enable compute_query_id new parameter?
Hello, Kuroda! On Fri, 2021-04-09 at 00:23 +0000, kuroda.hayato@fujitsu.com wrote: > I think you are right. > According to [1] we can bump up the version per one PG major version, > and any features are not committed yet for 15. > > [1]: https://www.postgresql.org/message-id/20201202040516.GA43757@nol Version 2 of patch attached. pg_stat_statements version is now 1.10 and patch is based on 0f61727. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested Hi, Andrei I tested your patch, and it works well. I also prefer timestamp inseatead of dealloc num. I think it can provide more useful details about query statements. Regards, Martin Sun
Hi, Martin On Mon, 2021-04-19 at 11:39 +0000, Chengxi Sun wrote: > I tested your patch, and it works well. I also prefer timestamp > inseatead of dealloc num. > I think it can provide more useful details about query statements. > Thank you for your review. Certainly, timestamp is valuable here. Deallocation number is only a workaround in unlikely case when timestamping will cost a much. It seems, that it can happen only when significant amount of statements causes a new entry in pg_stat_statements hashtable. However, in such case using of pg_stat_statements extension might be qute difficult. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re[2]: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Hi, Anton! I've corrected the patch and attached a new version. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company On Wed, 2021-10-06 at 18:13 +0300, Мельников Антон Андреевич wrote: > Hi, Andrey! > > I’ve tried to apply your patch v2-0001 on current master, but i > failed. > There were git apply errors at: > pg_stat_statements.out:941 > pg_stat_statements.sql:385 > > Best Regards, > > Anton Melnikov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company >
Attachment
There is an issue with this patch. It's main purpose is the ability to calculate values of pg_stat_statements view for a time period between two samples without resetting pg_stat_statements being absolutely sure that the statement was not evicted. Such approach solves the problem for metrics with except of min and max time values. It seems that partial reset is needed here resetting min/max values during a sample. But overall min/max values will be lost in this case. Does addition of resettable min/max metrics to the pg_stat_statemets view seems reasonable here? -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
> There is an issue with this patch. It's main purpose is the ability to
> calculate values of pg_stat_statements view
> [...]
> Does addition of resettable min/max metrics to the
> pg_stat_statemets view seems reasonable here?
Hello, Andrey! I think it depends on what the slow top level sampler wants. Let define the current values in pg_stat_statements for some query as gmin and gmax. It seems to be a two main variants: 1) If top level sampler wants to know for some query the min and max values for the entire watch time, then existing gmin and gmax in pg_stat_statements are sufficient. The top level sampler can clarify top_min and top_max at every slow sample as follows: top_max = gmax > top_max ? gmax : top_max; top_min = gmin < top_min ? gmin : top_min; This should work regardless of whether there was a reset between samples or not. 2) The second case happens when the top level sampler wants to know the min and max values for sampling period. In that case i think one shouldn't not use gmin and gmax and especially reset them asynchronously from outside because its lifetime and sampling period are independent values and moreover someone else might need gmin and gmax as is. So i suppose that additional vars loc_min and loc_max is a right way to do it. If that, at every top sample one need to replace not clarify the new top values as follows: top_max = loc_max; loc_max = 0; top_min = loc_min; loc_min = INT_MAX; And one more thing, if there was a reset of stats between two samples, then i think it is the best to ignore the new values, since they were obtained for an incomplete period. This patch, thanks to the saved time stamp, makes possible to determine the presence of reset between samples and there should not be a problem to realize such algorithm. The patch is now applied normally, all my tests were successful. The only thing I could suggest to your notice is a small cosmetic edit to replace the numeric value in #define on line 1429 of pg_stat_statements.c by one of the constants defined above. Best regards, Anton Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hello, On Sun, 2022-01-02 at 13:28 -0800, Andres Freund wrote: > Hi, > > This fails with an assertion failure: > https://cirrus-ci.com/task/5567540742062080?logs=cores#L55 > > Andres, thank you for your test! I've missed it. Fixed in attached patch v5. On Wed, 2021-12-22 at 04:25 +0300, Anton A. Melnikov wrote: > > > I completely agree that creating a separate view for these new fields > is > the most correct thing to do. Anton, I've created a new view named pg_stat_statements_aux. But for now both views are using the same function pg_stat_statements which returns all fields. It seems reasonable to me - if sampling solution will need all values it can query the function. > Also it might be better to use the term 'auxiliary' and use the same > approach as for existent similar vars. Agreed, renamed to auxiliary term. > So internally it might look something like this: > > double aux_min_time[PGSS_NUMKIND]; > double aux_max_time[PGSS_NUMKIND]; > TimestampTz aux_stats_reset; > > And at the view level: > aux_min_plan_time float8, > aux_max_plan_time float8, > aux_min_exec_time float8, > aux_max_exec_time float8, > aux_stats_reset timestamp with time zone > > Functions names might be pg_stat_statements_aux() and > pg_stat_statements_aux_reset(). > But it seems "stats_reset" term is not quite correct in this case. The "stats_since" field holds the timestamp of hashtable entry, but not the reset time. The same applies to aux_stats_since - for new statement it holds its entry time, but in case of reset it will hold the reset time. "stats_reset" name seems a little bit confusing to me. Attached patch v5 -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Hi, Andrey! I've checked the 5th version of the patch and there are some remarks. >I've created a new view named pg_stat_statements_aux. But for now both >views are using the same function pg_stat_statements which returns all >fields. It seems reasonable to me - if sampling solution will need all >values it can query the function. Agreed, it might be useful in some cases. >But it seems "stats_reset" term is not quite correct in this case. The >"stats_since" field holds the timestamp of hashtable entry, but not the >reset time. The same applies to aux_stats_since - for new statement it >holds its entry time, but in case of reset it will hold the reset time. Thanks for the clarification. Indeed if we mean the word 'reset' as the removal of all the hashtable entries during pg_stat_statements_reset() then we shouldn't use it for the first occurrence timestamp in the struct pgssEntry. So with the stats_since field everything is clear. On the other hand aux_stats_since field can be updated for two reasons: 1) The same as for stats_since due to first occurrence of entry in the hashtable. And it will be equal to stats_since timestamp in that case. 2) Due to an external reset from an upper level sampler. I think it's not very important how to name this field, but it would be better to mention both these reasons in the comment. As for more important things, if the aux_min_time initial value is zero like now, then if condition on line 1385 of pg_stat_statements.c will never be true and aux_min_time will remain zero. Init aux_min_time with INT_MAX can solve this problem. It is possible to reduce size of entry_reset_aux() function via removing if condition on line 2606 and entire third branch from line 2626. Thanks to check in 2612 this will work in all cases. Also it would be nice to move the repeating several times lines 2582-2588 into separate function. I think this can make entry_reset_aux() more shorter and clearer. In general, the 5th patch applies with no problems, make check-world and CI gives no error and patch seems to be closely to become RFC. With best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi Julien, Tue, 2022-01-25 at 18:08 +0800, Julien Rouhaud wrote: > > Are those 4 new counters really worth it? > > The min/max were added to make pg_stat_statements a bit more useful > if you > have nothing else using that extension. But once you setup a tool > that > snapshots the metrics regularly, do you really need to know e.g. > "what was the > maximum execution time in the last 3 years", which will typically be > what > people will retrieve from the non-aux min/max counters, rather than > simply > using your additional tool for better and more precise information? Of course we can replace old min/max metrics with the new "aux" min/max metrics. It seems reasonable to me because they will have the same behavior until we touch reset_aux. I think we can assume that min/max data is saved somewhere if reset_aux was performed, but how about the backward compatibility? There may be some monitoring solutions that doesn't expect min/max stats reset independently of other statement statistics. It seems highly unlikely to me, because the min/max stats for "the last 3 years" is obvious unusable but maybe someone uses them as a sign of something? Are we need to worry about that? Also, there is one more dramatic consequence of such decision... What min/max values should be returned after the auxiliary reset but before the next statement execution? The NULL values seems reasonable but there was not any NULLs before and backward compatibility seems broken. Another approach is to return the old values of min/max stats and the old aux_stats_since value until the next statement execution but it seems strange when the reset was performed and it doesn't affected any stats instantly. regards, Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi Andrei, On Tue, Jan 25, 2022 at 02:58:17PM +0300, Andrei Zubkov wrote: > > Of course we can replace old min/max metrics with the new "aux" min/max > metrics. It seems reasonable to me because they will have the same > behavior until we touch reset_aux. I think we can assume that min/max > data is saved somewhere if reset_aux was performed, but how about the > backward compatibility? > There may be some monitoring solutions that doesn't expect min/max > stats reset independently of other statement statistics. > It seems highly unlikely to me, because the min/max stats for "the last > 3 years" is obvious unusable but maybe someone uses them as a sign of > something? To be honest I don't see how any monitoring solution could make any use of those fields as-is. For that reason in PoWA we unfortunately have to entirely ignore them. There was a previous attempt to provide a way to reset those counters only (see [1]), but it was returned with feedback due to lack of TLC from the author. > Are we need to worry about that? I don't think it's a problem, as once you have a solution on top of pg_stat_statements, you get information order of magnitude better from that solution instead of pg_stat_statements. And if that's a problem, well either don't reset those counters, or don't use the external solution if it does it automatically and you're not ok with it. > Also, there is one more dramatic consequence of such decision... > What min/max values should be returned after the auxiliary reset but > before the next statement execution? > The NULL values seems reasonable but there was not any NULLs before and > backward compatibility seems broken. Another approach is to return the > old values of min/max stats and the old aux_stats_since value until the > next statement execution but it seems strange when the reset was > performed and it doesn't affected any stats instantly. If you're worried about some external table having a NOT NULL constraint for those fields, how about returning NaN instead? That's a valid value for a double precision data type. [1] https://www.postgresql.org/message-id/1762890.8ARNpCrDLI@peanuts2
Hi Julien, On Tue, 2022-01-25 at 20:22 +0800, Julien Rouhaud wrote: > To be honest I don't see how any monitoring solution could make any > use of > those fields as-is. For that reason in PoWA we unfortunately have to > entirely > ignore them. There was a previous attempt to provide a way to reset > those > counters only (see [1]), but it was returned with feedback due to > lack of TLC > from the author. Thank you, I've just seen a thread in [1], it was of course a weak attempt and I think it could be done better. > > > I don't think it's a problem, as once you have a solution on top of > pg_stat_statements, you get information order of magnitude better > from that solution instead of pg_stat_statements. Agreed. I'm worry about having different solutions running simultaneously. All of them may want to get accurate min/max values, but if they all will reset min/max statistics, they will interfere each other. It seems that min/max resetting should be configurable in each sampler as a partial problem solution. At least, every sampling solution can make a decision based on reset timestamps provided by pg_stat_statements now. > > > If you're worried about some external table having a NOT NULL > constraint for > those fields, how about returning NaN instead? That's a valid value > for a > double precision data type. I don't know for sure what we can expect to be wrong here. My opinion is to use NULLs, as they seems more suitable here. But this can be done only if we are not expecting significant side effects. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Hello! On 26.01.2022 16:43, Andrei Zubkov wrote: >> >> If you're worried about some external table having a NOT NULL >> constraint for >> those fields, how about returning NaN instead? That's a valid value >> for a >> double precision data type. > > I don't know for sure what we can expect to be wrong here. My opinion > is to use NULLs, as they seems more suitable here. But this can be done > only if we are not expecting significant side effects. Let me suggest for your consideration an additional reset request flag that can be used to synchronize reset in a way similar to interrupt handling. External reset can set this flag immediately. Then pg_stat_statements will wait for the moment when the required query hits into the statistics and only at this moment really reset the aux statistics, write a new timestamp and clear the flag. At the time of real reset, total_time will be determined, and pg_stat_statements can immediately initialize min and max correctly. From reset to the next query execution the aux view will give old correct values so neither NaNs nor NULLs will be required. Also we can put the value of reset request flag into the aux view to give feedback to the external application that reset was requested. With best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
This patch seems to have gotten some feedback but development has stalled. It's marked "waiting on author" but I'm not clear exactly what is expected from the authors here. It seems there isn't really consensus on the design at the moment. There's been no emails in over a month. Fwiw I find the idea of having a separate "aux" table kind of awkward. It'll seem strange to users not familiar with the history and without any clear idea why the fields are split.
Hi On Fri, 2022-03-25 at 00:37 -0400, Greg Stark wrote: > Fwiw I find the idea of having a separate "aux" table kind of > awkward. > It'll seem strange to users not familiar with the history and without > any clear idea why the fields are split. Greg, thank you for your attention and for your thought. I've just completed the 6th version of a patch implementing idea proposed by Julien Rouhaud, i.e. without auxiliary statistics. 6th version will reset current min/max fields to zeros until the first plan or execute. I've decided to use zeros here because planning statistics is zero in case of disabled tracking. I think sampling solution could easily handle this. -- Regards, Andrei Zubkov
Attachment
Hi, On Fri, Mar 25, 2022 at 01:25:23PM +0300, Andrei Zubkov wrote: > Greg, thank you for your attention and for your thought. > > I've just completed the 6th version of a patch implementing idea > proposed by Julien Rouhaud, i.e. without auxiliary statistics. 6th > version will reset current min/max fields to zeros until the first plan > or execute. Thanks! I've decided to use zeros here because planning statistics > is zero in case of disabled tracking. I think sampling solution could > easily handle this. I'm fine with it. It's also consistent with the planning counters when track_planning is disabled. And even if the sampling solution doesn't handle it, you will simply get consistent values, like "0 calls with minmax timing of 0 msec", so it's not really a problem. Feature wise, I'm happy with the patch. I just have a few comments. Tests: - it's missing some test in sql/oldextversions.sql to validate that the code works with the extension in version 1.9 - the last test removed the minmax_plan_zero field, why? Code: + TimestampTz stats_since; /* timestamp of entry allocation moment */ I think "timestamp of entry allocation" is enough? + * Calculate min and max time. min = 0 and max = 0 + * means that min/max statistics reset was happen maybe "means that the min/max statistics were reset" +/* + * Reset min/max values of specified entries + */ +static void +entry_minmax_reset(Oid userid, Oid dbid, uint64 queryid) +{ [...] There's a lot of duplicated logic with entry_reset(). Would it be possible to merge at least the C reset function to handle either all-metrics or minmax-only? Also, maybe it would be better to have a single SQL reset function, something like: pg_stat_statements_reset(IN userid Oid DEFAULT 0, IN dbid Oid DEFAULT 0, IN queryid bigint DEFAULT 0, IN minmax_only DEFAULT false ) Doc: + <structfield>stats_since</structfield> <type>timestamp with time zone</type> + </para> + <para> + Timestamp of statistics gathering start for the statement The description is a bit weird. Maybe like "Time at which statistics gathering started for this statement"? Same for the minmax version.
Hi Julien! Thank you for such detailed review! On Wed, 2022-03-30 at 17:31 +0800, Julien Rouhaud wrote: > Feature wise, I'm happy with the patch. I just have a few comments. > > Tests: > > - it's missing some test in sql/oldextversions.sql to validate that the > code > works with the extension in version 1.9 Yes, I've just added some tests there, but it seems they are not quite suficient. Maybe we should try to do some queries to views and functions in old versions? A least when new C function version appears... During tests developing I've noted that current test of pg_stat_statements_info view actually tests only view access. However we can test at least functionality of stats_reset field like this: SELECT now() AS ref_ts \gset SELECT dealloc, stats_reset >= :'ref_ts' FROM pg_stat_statements_info; SELECT pg_stat_statements_reset(); SELECT dealloc, stats_reset >= :'ref_ts' FROM pg_stat_statements_info; Does it seems reasonable? > - the last test removed the minmax_plan_zero field, why? My thaught was as follows... Reexecution of the same query will definitely cause execution. However, most likely it wouldn't be planned, but if it would (maybe this is possible, or maybe it will be possible in the future in some cases) the test shouldn't fail. Checking of only execution stats seems enough to me - in most cases we can't check planning stats with such test anyway. What do you think about it? > > Code: > > + TimestampTz stats_since; /* timestamp of entry > allocation moment */ > > I think "timestamp of entry allocation" is enough? Yes > > + * Calculate min and max time. min = 0 and max > = 0 > + * means that min/max statistics reset was > happen > > maybe "means that the min/max statistics were reset" Agreed > > +/* > + * Reset min/max values of specified entries > + */ > +static void > +entry_minmax_reset(Oid userid, Oid dbid, uint64 queryid) > +{ > [...] > > There's a lot of duplicated logic with entry_reset(). > Would it be possible to merge at least the C reset function to handle > either > all-metrics or minmax-only? Great point! I've merged minmax reset functionality in the entry_reset function. > Also, maybe it would be better to have a single SQL > reset function, something like: > > pg_stat_statements_reset(IN userid Oid DEFAULT 0, > IN dbid Oid DEFAULT 0, > IN queryid bigint DEFAULT 0, > IN minmax_only DEFAULT false > ) Of course! > > Doc: > > + <structfield>stats_since</structfield> <type>timestamp with > time zone</type> > + </para> > + <para> > + Timestamp of statistics gathering start for the statement > > The description is a bit weird. Maybe like "Time at which statistics > gathering > started for this statement"? Same for the minmax version. Agreed. I've attached 7th patch version with fixes mentioned above. -- Best regards, Andrei Zubkov
Attachment
FYI this has a compiler warning showing up on the cfbot: [13:19:51.544] pg_stat_statements.c: In function ‘entry_reset’: [13:19:51.544] pg_stat_statements.c:2598:32: error: ‘minmax_stats_reset’ may be used uninitialized in this function [-Werror=maybe-uninitialized] [13:19:51.544] 2598 | entry->minmax_stats_since = minmax_stats_reset; [13:19:51.544] | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ If the patch is otherwise ready to commit then this is an issue that should be fixed before marking it ready to commit. Given that this is the last week before feature freeze it'll probably get moved to a future commitfest unless it's ready to commit.
Hi, On Fri, Apr 01, 2022 at 11:38:52AM -0400, Greg Stark wrote: > FYI this has a compiler warning showing up on the cfbot: > > [13:19:51.544] pg_stat_statements.c: In function ‘entry_reset’: > [13:19:51.544] pg_stat_statements.c:2598:32: error: > ‘minmax_stats_reset’ may be used uninitialized in this function > [-Werror=maybe-uninitialized] > [13:19:51.544] 2598 | entry->minmax_stats_since = minmax_stats_reset; > [13:19:51.544] | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ > > If the patch is otherwise ready to commit then this is an issue that > should be fixed before marking it ready to commit. > > Given that this is the last week before feature freeze it'll probably > get moved to a future commitfest unless it's ready to commit. As I mentioned in my last review I think feature wise the patch is ok, it just needed a few minor changes. It's a small patch but can help *a lot* tools on top of pg_stat_statements and give users a better overview of their workload so it would be nice to commit it in v15. I was busy looking at the prefetch patch today (not done yet), but I plan to review the last version over the weekend. After a quick look at the patch it seems like a compiler bug. I'm not sure which clang version is used, but can't reproduce it locally using clang 13. I already saw similar false positive, when a variable is initialized in a branch (here minmax_only == true), and only then used in similar branches. I guess that pg_stat_statement_reset() is so expensive that an extra gettimeofday() wouldn't change much. Otherwise initializing to NULL should be enough.
Hi, Thank you, Greg On Fri, 2022-04-01 at 11:38 -0400, Greg Stark wrote: > [13:19:51.544] pg_stat_statements.c: In function ‘entry_reset’: > [13:19:51.544] pg_stat_statements.c:2598:32: error: > ‘minmax_stats_reset’ may be used uninitialized in this function > [-Werror=maybe-uninitialized] > [13:19:51.544] 2598 | entry->minmax_stats_since = minmax_stats_reset; > [13:19:51.544] | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ > I was afraid of such warning can appear.. On Sat, 2022-04-02 at 00:13 +0800, Julien Rouhaud wrote: > I guess that pg_stat_statement_reset() is so > expensive that an extra gettimeofday() wouldn't change much. Agreed > Otherwise > initializing to NULL should be enough. Julien, I would prefer an extra GetCurrentTimestamp(). So, I've opted to use the common unconditional stats_reset = GetCurrentTimestamp(); for an entire entry_reset function due to the following: 1. It will be uniform for stats_reset and minmax_stats_reset 2. As you mentioned, it wouldn't change a much 3. The most common way to use this function is to reset all statements meaning that GetCurrentTimestamp() will be called anyway to update the value of stats_reset field in pg_stat_statements_info view 4. Actually I would like that pg_stat_statements_reset function was able to return the value of stats_reset as its result. This could give to the sampling solutions the ability to check if the last reset (of any type) was performed by this solution or any other reset was performed by someone else. It seems valuable to me, but it changes the result type of the pg_stat_statements_reset() function, so I don't know if we can do that right now. v8 attached -- regards, Andrei
Attachment
Hi, On 2022-04-01 22:47:02 +0300, Andrei Zubkov wrote: > + entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_FIND, NULL); > + > + if (entry) { > + /* Found */ > + if (minmax_only) { > + /* When requested reset only min/max statistics of an entry */ > + entry_counters = &entry->counters; > + for (int kind = 0; kind < PGSS_NUMKIND; kind++) > + { > + entry_counters->max_time[kind] = 0; > + entry_counters->min_time[kind] = 0; > + } > + entry->minmax_stats_since = stats_reset; > + } > + else > + { > + /* Remove the key otherwise */ > + hash_search(pgss_hash, &entry->key, HASH_REMOVE, NULL); > + num_remove++; > + } > + } It seems decidedly not great to have four copies of this code. It was already not great before, but this patch makes the duplicated section go from four lines to 20 or so. Greetings, Andres Freund
On Thu, Mar 31, 2022 at 01:06:10PM +0300, Andrei Zubkov wrote: > > On Wed, 2022-03-30 at 17:31 +0800, Julien Rouhaud wrote: > > Feature wise, I'm happy with the patch. I just have a few comments. > > > > Tests: > > > > - it's missing some test in sql/oldextversions.sql to validate that the > > code > > works with the extension in version 1.9 > > Yes, I've just added some tests there, but it seems they are not quite > suficient. Maybe we should try to do some queries to views and > functions in old versions? A least when new C function version > appears... I'm not sure if that's really helpful. If you have new C functions and old SQL-version, you won't be able to reach them anyway. Similarly, if you have the new SQL but the old .so (which we can't test), it will just fail as the symbol doesn't exist. The real problem that has to be explicitly handled by the C code is different signatures for C functions. > > During tests developing I've noted that current test of > pg_stat_statements_info view actually tests only view access. However > we can test at least functionality of stats_reset field like this: > > SELECT now() AS ref_ts \gset > SELECT dealloc, stats_reset >= :'ref_ts' FROM pg_stat_statements_info; > SELECT pg_stat_statements_reset(); > SELECT dealloc, stats_reset >= :'ref_ts' FROM pg_stat_statements_info; > > Does it seems reasonable? It looks reasonable, especially if the patch adds a new mode for the reset function. > > - the last test removed the minmax_plan_zero field, why? > > My thaught was as follows... Reexecution of the same query will > definitely cause execution. However, most likely it wouldn't be > planned, but if it would (maybe this is possible, or maybe it will be > possible in the future in some cases) the test shouldn't fail. Checking > of only execution stats seems enough to me - in most cases we can't > check planning stats with such test anyway. > What do you think about it? Ah I see. I guess we could set plan_cache_mode to force_generic_plan to make sure we go though planning. But otherwise just adding a comment saying that the test has to be compatible with different plan caching approach would be fine with me. Thanks for the work on merging the functions! I will reply on the other parts of the thread where some discussion started.
On Fri, Apr 01, 2022 at 10:47:02PM +0300, Andrei Zubkov wrote: > > On Fri, 2022-04-01 at 11:38 -0400, Greg Stark wrote: > > [13:19:51.544] pg_stat_statements.c: In function ‘entry_reset’: > > [13:19:51.544] pg_stat_statements.c:2598:32: error: > > ‘minmax_stats_reset’ may be used uninitialized in this function > > [-Werror=maybe-uninitialized] > > [13:19:51.544] 2598 | entry->minmax_stats_since = minmax_stats_reset; > > [13:19:51.544] | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ > > > > I was afraid of such warning can appear.. > > On Sat, 2022-04-02 at 00:13 +0800, Julien Rouhaud wrote: > > I guess that pg_stat_statement_reset() is so > > expensive that an extra gettimeofday() wouldn't change much. > > Agreed > > > Otherwise > > initializing to NULL should be enough. > > Julien, I would prefer an extra GetCurrentTimestamp(). So, I've opted > to use the common unconditional > > stats_reset = GetCurrentTimestamp(); > > for an entire entry_reset function due to the following: > > 1. It will be uniform for stats_reset and minmax_stats_reset > 2. As you mentioned, it wouldn't change a much > 3. The most common way to use this function is to reset all statements > meaning that GetCurrentTimestamp() will be called anyway to update the > value of stats_reset field in pg_stat_statements_info view > 4. Actually I would like that pg_stat_statements_reset function was > able to return the value of stats_reset as its result. This could give > to the sampling solutions the ability to check if the last reset (of > any type) was performed by this solution or any other reset was > performed by someone else. It seems valuable to me, but it changes the > result type of the pg_stat_statements_reset() function, so I don't know > if we can do that right now. I'm fine with always getting the current timestamp when calling the function. I'm not sure about returning the ts. If you need it you could call SELECT now() FROM pg_stat_statements_reset() (or clock_timestamp()). It won't be entirely accurate but since the function will have an exclusive lock during the whole execution that shouldn't be a problem. Now you're already adding a new version of the C function so I guess that it wouldn't require any additional effort so why not.
On Fri, Apr 01, 2022 at 01:01:53PM -0700, Andres Freund wrote: > Hi, > > On 2022-04-01 22:47:02 +0300, Andrei Zubkov wrote: > > + entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_FIND, NULL); > > + > > + if (entry) { > > + /* Found */ > > + if (minmax_only) { > > + /* When requested reset only min/max statistics of an entry */ > > + entry_counters = &entry->counters; > > + for (int kind = 0; kind < PGSS_NUMKIND; kind++) > > + { > > + entry_counters->max_time[kind] = 0; > > + entry_counters->min_time[kind] = 0; > > + } > > + entry->minmax_stats_since = stats_reset; > > + } > > + else > > + { > > + /* Remove the key otherwise */ > > + hash_search(pgss_hash, &entry->key, HASH_REMOVE, NULL); > > + num_remove++; > > + } > > + } > > It seems decidedly not great to have four copies of this code. It was already > not great before, but this patch makes the duplicated section go from four > lines to 20 or so. +1
Hi, On Fri, 2022-04-01 at 13:01 -0700, Andres Freund wrote: > It seems decidedly not great to have four copies of this code. It was > already > not great before, but this patch makes the duplicated section go from > four > lines to 20 or so. Agreed. I've created the single_entry_reset() function wrapping this code. I wonder if it should be declared as inline to speedup a little. On Sat, 2022-04-02 at 15:10 +0800, Julien Rouhaud wrote: > > However > > we can test at least functionality of stats_reset field like this: > > > > SELECT now() AS ref_ts \gset > > SELECT dealloc, stats_reset >= :'ref_ts' FROM > > pg_stat_statements_info; > > SELECT pg_stat_statements_reset(); > > SELECT dealloc, stats_reset >= :'ref_ts' FROM > > pg_stat_statements_info; > > > > Does it seems reasonable? > > It looks reasonable, especially if the patch adds a new mode for the > reset > function. I've implemented this test. > > Checking > > of only execution stats seems enough to me - in most cases we can't > > check planning stats with such test anyway. > > What do you think about it? > > Ah I see. I guess we could set plan_cache_mode to force_generic_plan > to make > sure we go though planning. But otherwise just adding a comment > saying that > the test has to be compatible with different plan caching approach > would be > fine with me. Set plan_cache_mode seems a little bit excess to me. And maybe in the future some another plan caching strategies will be implementd with coresponding settings.. So I've just left a comment there. On Sat, 2022-04-02 at 15:21 +0800, Julien Rouhaud wrote: > I'm not sure about returning the ts. If you need it you could call > SELECT > now() FROM pg_stat_statements_reset() (or clock_timestamp()). It > won't be > entirely accurate but since the function will have an exclusive lock > during the > whole execution that shouldn't be a problem. Now you're already > adding a new > version of the C function so I guess that it wouldn't require any > additional > effort so why not. I think that if we can do it in accurate way and there is no obvious side effects, why not to try it... Changing of pg_stat_statements_reset function result caused a confiderable tests update. Also, I'm not sure that my description of this feature in the docs is blameless.. After all, I'm a little bit in doubt about this feature, so I'm ready to rollback it. v9 attached -- regards, Andrei
Attachment
On Sat, Apr 02, 2022 at 01:12:54PM +0300, Andrei Zubkov wrote: > On Fri, 2022-04-01 at 13:01 -0700, Andres Freund wrote: > > It seems decidedly not great to have four copies of this code. It was > > already > > not great before, but this patch makes the duplicated section go from > > four > > lines to 20 or so. > > Agreed. I've created the single_entry_reset() function wrapping this > code. I wonder if it should be declared as inline to speedup a little. Maybe a macro would be better here? I don't know if that's generally ok or just old and not-that-great code, but there are other places relying on macros when a plain function call isn't that convenient (like here returning 0 or 1 as a hack for incrementing num_remove), for instance in hba.c. > On Sat, 2022-04-02 at 15:21 +0800, Julien Rouhaud wrote: > > I'm not sure about returning the ts. If you need it you could call > > SELECT > > now() FROM pg_stat_statements_reset() (or clock_timestamp()). It > > won't be > > entirely accurate but since the function will have an exclusive lock > > during the > > whole execution that shouldn't be a problem. Now you're already > > adding a new > > version of the C function so I guess that it wouldn't require any > > additional > > effort so why not. > > I think that if we can do it in accurate way and there is no obvious > side effects, why not to try it... > Changing of pg_stat_statements_reset function result caused a > confiderable tests update. Also, I'm not sure that my description of > this feature in the docs is blameless.. > > After all, I'm a little bit in doubt about this feature, so I'm ready > to rollback it. Well, I personally don't think that I would need it for powa as it's designed to have very frequent snapshot. I know you have a different approach in pg_profile, but I'm not sure it will be that useful for you either?
On Sat, 2022-04-02 at 18:56 +0800, Julien Rouhaud wrote: > Maybe a macro would be better here? I don't know if that's generally > ok or > just old and not-that-great code, but there are other places relying > on macros > when a plain function call isn't that convenient (like here returning > 0 or 1 as > a hack for incrementing num_remove), for instance in hba.c. Yes, it is not very convenient and not looks pretty, so I'll try a macro here soon. > > I think that if we can do it in accurate way and there is no > > obvious > > side effects, why not to try it... > > Changing of pg_stat_statements_reset function result caused a > > confiderable tests update. Also, I'm not sure that my description > > of > > this feature in the docs is blameless.. > > > > After all, I'm a little bit in doubt about this feature, so I'm > > ready > > to rollback it. > > Well, I personally don't think that I would need it for powa as it's > designed > to have very frequent snapshot. I know you have a different approach > in > pg_profile, but I'm not sure it will be that useful for you either? Of course I can do some workaround if the accurate reset time will be unavailable. I just want to do the whole thing if it doesn't hurt. If we have a plenty of timestamps saved now, I think it is a good idea to have then bound to some milestones. At least it is a pretty equal join condition between samples. But if you think we should avoid returning ts here I won't insist on that.
On Sat, 2022-04-02 at 14:11 +0300, Andrei Zubkov wrote: > On Sat, 2022-04-02 at 18:56 +0800, Julien Rouhaud wrote: > > Maybe a macro would be better here? I don't know if that's > > generally > > ok or > > just old and not-that-great code, but there are other places > > relying > > on macros > > when a plain function call isn't that convenient (like here > > returning > > 0 or 1 as > > a hack for incrementing num_remove), for instance in hba.c. > > Yes, it is not very convenient and not looks pretty, so I'll try a > macro here soon. Implemented SINGLE_ENTRY_RESET as a macro. v10 attached -- regards, Andrei
Attachment
The tests for this seem to need adjustments. [12:41:09.403] test pg_stat_statements ... FAILED 180 ms diff -U3 /tmp/cirrus-ci-build/contrib/pg_stat_statements/expected/pg_stat_statements.out /tmp/cirrus-ci-build/contrib/pg_stat_statements/results/pg_stat_statements.out --- /tmp/cirrus-ci-build/contrib/pg_stat_statements/expected/pg_stat_statements.out 2022-04-02 12:37:42.201823383 +0000 +++ /tmp/cirrus-ci-build/contrib/pg_stat_statements/results/pg_stat_statements.out 2022-04-02 12:41:09.219563204 +0000 @@ -1174,8 +1174,8 @@ ORDER BY query; query | reset_ts_match ---------------------------+---------------- - SELECT $1,$2 AS "STMTTS2" | f SELECT $1 AS "STMTTS1" | t + SELECT $1,$2 AS "STMTTS2" | f (2 rows) -- check that minmax reset does not set stats_reset Hm. Is this a collation problem?
Greg Stark <stark@mit.edu> writes: > The tests for this seem to need adjustments. > [12:41:09.403] test pg_stat_statements ... FAILED 180 ms > diff -U3 /tmp/cirrus-ci-build/contrib/pg_stat_statements/expected/pg_stat_statements.out > /tmp/cirrus-ci-build/contrib/pg_stat_statements/results/pg_stat_statements.out > --- /tmp/cirrus-ci-build/contrib/pg_stat_statements/expected/pg_stat_statements.out > 2022-04-02 12:37:42.201823383 +0000 > +++ /tmp/cirrus-ci-build/contrib/pg_stat_statements/results/pg_stat_statements.out > 2022-04-02 12:41:09.219563204 +0000 > @@ -1174,8 +1174,8 @@ > ORDER BY query; > query | reset_ts_match > ---------------------------+---------------- > - SELECT $1,$2 AS "STMTTS2" | f > SELECT $1 AS "STMTTS1" | t > + SELECT $1,$2 AS "STMTTS2" | f > (2 rows) > -- check that minmax reset does not set stats_reset > Hm. Is this a collation problem? Yeah, looks like it. ORDER BY query COLLATE "C" might work better. regards, tom lane
Hi Greg, On Sat, 2022-04-02 at 17:38 -0400, Greg Stark wrote: > The tests for this seem to need adjustments. > > [12:41:09.403] test pg_stat_statements ... FAILED 180 ms > query | reset_ts_match > ---------------------------+---------------- > - SELECT $1,$2 AS "STMTTS2" | f > SELECT $1 AS "STMTTS1" | t > + SELECT $1,$2 AS "STMTTS2" | f > (2 rows) > > -- check that minmax reset does not set stats_reset > > > Hm. Is this a collation problem? Of course, thank you! I've forgot to set collation here. v11 attached -- regards, Andrei
Attachment
On Sun, Apr 03, 2022 at 07:32:47AM +0300, Andrei Zubkov wrote: > v11 attached + /* When requested reset only min/max statistics of an entry */ \ + entry_counters = &entry->counters; \ + for (int kind = 0; kind < PGSS_NUMKIND; kind++) \ + { \ + entry_counters->max_time[kind] = 0; \ + entry_counters->min_time[kind] = 0; \ + } \ [...] +static TimestampTz +entry_reset(Oid userid, Oid dbid, uint64 queryid, bool minmax_only) { HASH_SEQ_STATUS hash_seq; pgssEntry *entry; + Counters *entry_counters; Do we really need an extra variable? Why not simply using entry->counters.xxx_time[kind]? Also, I think it's better to make the macro more like function looking, so SINGLE_ENTRY_RESET(). index f2e822acd3..c2af29866b 100644 --- a/contrib/pg_stat_statements/sql/oldextversions.sql +++ b/contrib/pg_stat_statements/sql/oldextversions.sql @@ -36,4 +36,12 @@ AlTER EXTENSION pg_stat_statements UPDATE TO '1.8'; \d pg_stat_statements SELECT pg_get_functiondef('pg_stat_statements_reset'::regproc); +ALTER EXTENSION pg_stat_statements UPDATE TO '1.9'; +\d pg_stat_statements +\d pg_stat_statements_info +SELECT pg_get_functiondef('pg_stat_statements_reset'::regproc); I don't think this bring any useful coverage. Minimum time spent planning the statement, in milliseconds (if <varname>pg_stat_statements.track_planning</varname> is enabled, - otherwise zero) + otherwise zero), this field will contain zero until this statement + is planned fist time after reset performed by the + <function>pg_stat_statements_reset</function> function with the + <structfield>minmax_only</structfield> parameter set to <literal>true</literal> I think this need some rewording (and s/fist/first). Maybe: Minimum time spent planning the statement, in milliseconds. This field will be zero if <varname>pg_stat_statements.track_planning</varname> is disabled, or if the counter has been reset using the the <function>pg_stat_statements_reset</function> function with the <structfield>minmax_only</structfield> parameter set to <literal>true</literal> and never been planned since. <primary>pg_stat_statements_reset</primary> </indexterm> @@ -589,6 +623,20 @@ If all statistics in the <filename>pg_stat_statements</filename> view are discarded, it will also reset the statistics in the <structname>pg_stat_statements_info</structname> view. + When <structfield>minmax_only</structfield> is <literal>true</literal> only the + values of minimun and maximum execution and planning time will be reset (i.e. Nitpicking: I would say planning and execution time, as the fields are in this order in the view/function.
Hi Julien, On Sun, 2022-04-03 at 15:07 +0800, Julien Rouhaud wrote: > +static TimestampTz > +entry_reset(Oid userid, Oid dbid, uint64 queryid, bool minmax_only) > { > HASH_SEQ_STATUS hash_seq; > pgssEntry *entry; > + Counters *entry_counters; > > Do we really need an extra variable? Why not simply using > entry->counters.xxx_time[kind]? > > Also, I think it's better to make the macro more like function > looking, so > SINGLE_ENTRY_RESET(). Agreed with the both, I'll fix it. > > index f2e822acd3..c2af29866b 100644 > --- a/contrib/pg_stat_statements/sql/oldextversions.sql > +++ b/contrib/pg_stat_statements/sql/oldextversions.sql > @@ -36,4 +36,12 @@ AlTER EXTENSION pg_stat_statements UPDATE TO > '1.8'; > \d pg_stat_statements > SELECT pg_get_functiondef('pg_stat_statements_reset'::regproc); > > +ALTER EXTENSION pg_stat_statements UPDATE TO '1.9'; > +\d pg_stat_statements > +\d pg_stat_statements_info > +SELECT pg_get_functiondef('pg_stat_statements_reset'::regproc); > > I don't think this bring any useful coverage. I feel the same, but I've done it like previous tests (versions 1.7 and 1.8). Am I miss something here? Do you think we should remove these tests completly? > > I think this need some rewording (and s/fist/first). Maybe: > > Minimum time spent planning the statement, in milliseconds. > > This field will be zero if > <varname>pg_stat_statements.track_planning</varname> > is disabled, or if the counter has been reset using the the > <function>pg_stat_statements_reset</function> function with the > <structfield>minmax_only</structfield> parameter set to > <literal>true</literal> > and never been planned since. Thanks a lot! > > <primary>pg_stat_statements_reset</primary> > </indexterm> > @@ -589,6 +623,20 @@ > If all statistics in the > <filename>pg_stat_statements</filename> > view are discarded, it will also reset the statistics in the > <structname>pg_stat_statements_info</structname> view. > + When <structfield>minmax_only</structfield> is > <literal>true</literal> only the > + values of minimun and maximum execution and planning time will > be reset (i.e. > > Nitpicking: I would say planning and execution time, as the fields > are in this > order in the view/function. Agreed. -- regards, Andrei
I've attached v12 of a patch. The only unsolved issue now is the following: On Sun, 2022-04-03 at 15:07 +0800, Julien Rouhaud wrote: > +ALTER EXTENSION pg_stat_statements UPDATE TO '1.9'; > +\d pg_stat_statements > +\d pg_stat_statements_info > +SELECT pg_get_functiondef('pg_stat_statements_reset'::regproc); > > I don't think this bring any useful coverage. It is a little bit unclear to me what is the best solution here. -- regards, Andrei
Attachment
Hi, On Sun, Apr 03, 2022 at 12:29:43PM +0300, Andrei Zubkov wrote: > I've attached v12 of a patch. The only unsolved issue now is the > following: > > On Sun, 2022-04-03 at 15:07 +0800, Julien Rouhaud wrote: > > +ALTER EXTENSION pg_stat_statements UPDATE TO '1.9'; > > +\d pg_stat_statements > > +\d pg_stat_statements_info > > +SELECT pg_get_functiondef('pg_stat_statements_reset'::regproc); > > > > I don't think this bring any useful coverage. > > It is a little bit unclear to me what is the best solution here. Sorry, I missed that there were some similar tests already for previous versions. This was probably discussed and agreed before, so +1 to be consistent with the new versions. The patch looks good to me, although I will do a full review to make sure I didn't miss anything. Just another minor nitpicking after a quick look: + This field will be zero if ... [...] + this field will contain zero until this statement ... The wording should be consistent, so either "will be zero" or "will contain zero" everywhere. I'm personally fine with any, but maybe a native English will think one is better.
Julien, On Sun, 2022-04-03 at 17:56 +0800, Julien Rouhaud wrote: > Just another minor nitpicking after a quick look: > > + This field will be zero if ... > [...] > + this field will contain zero until this statement ... > > The wording should be consistent, so either "will be zero" or "will > contain > zero" everywhere. I'm personally fine with any, but maybe a native > English > will think one is better. Agreed. Searching the docs I've fond out that "will contain" usually used with the description of contained structure rather then a simple value. So I'll use a "will be zero" in the next version after your review. -- regards, Andrei
Hi, On Sun, Apr 03, 2022 at 01:24:40PM +0300, Andrei Zubkov wrote: > > On Sun, 2022-04-03 at 17:56 +0800, Julien Rouhaud wrote: > > Just another minor nitpicking after a quick look: > > > > + This field will be zero if ... > > [...] > > + this field will contain zero until this statement ... > > > > The wording should be consistent, so either "will be zero" or "will > > contain > > zero" everywhere. I'm personally fine with any, but maybe a native > > English > > will think one is better. > Agreed. > > Searching the docs I've fond out that "will contain" usually used with > the description of contained structure rather then a simple value. So > I'll use a "will be zero" in the next version after your review. Ok! So last round of review. - the commit message: It should probably mention the mimnax_stats_since at the beginning. Also, both the view and the function contain those new field. Minor rephrasing: s/evicted and returned back/evicted and stored again/? s/with except of all/with the exception of all/ s/is now returns/now returns/ - code: +#define SINGLE_ENTRY_RESET() \ +if (entry) { \ [...] It's not great to rely on caller context too much. I think it would be better to pass at least the entry as a parameter (maybe e?) to the macro for more clarity. I'm fine with keeping minmax_only, stats_reset and num_remove as is. Apart from that I think this is ready!
Hi Julien, Thank you very much for your work on this patch! On Mon, 2022-04-04 at 10:31 +0800, Julien Rouhaud wrote: > - the commit message: > > It should probably mention the mimnax_stats_since at the beginning. > Also, both > the view and the function contain those new field. > > Minor rephrasing: > > s/evicted and returned back/evicted and stored again/? > s/with except of all/with the exception of all/ > s/is now returns/now returns/ Agreed, commit message updated. > - code: > > +#define SINGLE_ENTRY_RESET() \ > +if (entry) { \ > [...] > > It's not great to rely on caller context too much. Yes, I was thinking about it. But using 4 parameters seemed strange to me. > I think it would be better > to pass at least the entry as a parameter (maybe e?) to the macro for > more > clarity. I'm fine with keeping minmax_only, stats_reset and > num_remove as is. Using an entry as a macro parameter looks good, I'm fine with "e". > Apart from that I think this is ready! v13 attached -- regards, Andrei
Attachment
Hi, On Mon, Apr 04, 2022 at 09:59:04AM +0300, Andrei Zubkov wrote: > > Minor rephrasing: > > > > s/evicted and returned back/evicted and stored again/? > > s/with except of all/with the exception of all/ > > s/is now returns/now returns/ > > Agreed, commit message updated. > > > - code: > > > > +#define SINGLE_ENTRY_RESET() \ > > +if (entry) { \ > > [...] > > > > It's not great to rely on caller context too much. > > Yes, I was thinking about it. But using 4 parameters seemed strange to > me. > > > I think it would be better > > to pass at least the entry as a parameter (maybe e?) to the macro for > > more > > clarity. I'm fine with keeping minmax_only, stats_reset and > > num_remove as is. > > Using an entry as a macro parameter looks good, I'm fine with "e". > > > Apart from that I think this is ready! > > v13 attached Thanks a lot! I'm happy with this version, so I'm marking it as Ready for Committer.
Hi, I've rebased this patch so that it can be applied after 57d6aea00fc. v14 attached -- regards, Andrei
Attachment
Hi, I took a quick look at this patch, to see if there's something we want/can get into v16. The last version was submitted about 9 months ago, and it doesn't apply cleanly anymore, but the bitrot is fairly minor. Not sure there's still interest, though. As for the patch, I wonder if it's unnecessarily complex. It adds *two* timestamps for each pg_stat_statements entry - one for reset of the whole entry, one for reset of "min/max" times only. I can see why the first timestamp (essentially tracking creating of the entry) is useful. I'd probably call it "created_at" or something like that, but that's a minor detail. Or maybe stats_reset, which is what we use in pgstat? But is the second timestamp for the min/max fields really useful? AFAIK to perform analysis, people take regular pg_stat_statements snapshots, which works fine for counters (calculating deltas) but not for gauges (which need a reset, to track fresh values). But people analyzing this are already resetting the whole entry, and so the snapshots already are tracking deltas. So I'm not convinced actually need the second timestamp. A couple more comments: 1) I'm not sure why the patch is adding tests of permissions on the pg_stat_statements_reset function? 2) If we want the second timestamp, shouldn't it also cover resets of the mean values, not just min/max? 3) I don't understand why the patch is adding "IS NOT NULL AS t" to various places in the regression tests. 4) I rather dislike the "minmax" naming, because that's often used in other contexts (for BRIN indexes), and as I mentioned maybe it should also cover the "mean" fields. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Tomas, On Wed, 2023-01-18 at 17:29 +0100, Tomas Vondra wrote: > I took a quick look at this patch, to see if there's something we > want/can get into v16. The last version was submitted about 9 months > ago, and it doesn't apply cleanly anymore, but the bitrot is fairly > minor. Not sure there's still interest, though. Thank you for your attention to this patch! I'm still very interest in this patch. And I think I'll try to rebase this patch during a week or two if it seems possible to get it in 16.. > > I'd probably call it "created_at" or something like > that, but that's a minor detail. Or maybe stats_reset, which is what > we > use in pgstat? Yes there is some naming issue. My thought was the following: - "stats_reset" is not quite correct here, because the statement entry moment if definitely not a reset. The field named just as it means - this is time of the moment from which statistics is collected for this particular entry. - "created_at" perfectly matches the purpose of the field, but seems not such self-explaining to me. > > But is the second timestamp for the min/max fields really useful? > AFAIK > to perform analysis, people take regular pg_stat_statements > snapshots, > which works fine for counters (calculating deltas) but not for gauges > (which need a reset, to track fresh values). But people analyzing > this > are already resetting the whole entry, and so the snapshots already > are > tracking deltas. > > So I'm not convinced actually need the second timestamp. The main purpose of the patch is to provide means to collecting solutions to avoid the reset of pgss at all. Just like it happens for the pg_stat_ views. The only really need of reset is that we can't be sure that observing statement was not evicted and come back since last sample. Right now we only can do a whole reset on each sample and see how many entries will be in pgss hashtable on the next sample - how close this value to the max. If there is a plenty space in hashtable we can hope that there was not evictions since last sample. However there could be reset performed by someone else and we are know nothing about this. Having a timestamp in stats_since field we are sure about how long this statement statistics is tracked. That said sampling solution can totally avoid pgss resets. Avoiding such resets means avoiding interference between monitoring solutions. But if no more resets is done we can't track min/max values, because they still needs a reset and we can do nothing with such resets - they are necessary. However I still want to know when min/max reset was performed. This will help to detect possible interference on such resets. > > > A couple more comments: > > 1) I'm not sure why the patch is adding tests of permissions on the > pg_stat_statements_reset function? > > 2) If we want the second timestamp, shouldn't it also cover resets of > the mean values, not just min/max? I think that mean values shouldn't be target for a partial reset because the value for mean values can be easily reconstructed by the sampling solution without a reset. > > 3) I don't understand why the patch is adding "IS NOT NULL AS t" to > various places in the regression tests. The most of tests was copied from the previous version. I'll recheck them. > > 4) I rather dislike the "minmax" naming, because that's often used in > other contexts (for BRIN indexes), and as I mentioned maybe it should > also cover the "mean" fields. Agreed, but I couldn't make it better. Other versions seemed worse to me... > > Regards, Andrei Zubkov
Hi, I've updated this patch for the current master. Also I have some additional explanations.. On Wed, 2023-01-18 at 17:29 +0100, Tomas Vondra wrote: > 1) I'm not sure why the patch is adding tests of permissions on the > pg_stat_statements_reset function? I've fixed that > > 2) If we want the second timestamp, shouldn't it also cover resets of > the mean values, not just min/max? I think that mean values are not a targets for auxiliary resets because any sampling solution can easily calculate the mean values between samples without a reset. > > 3) I don't understand why the patch is adding "IS NOT NULL AS t" to > various places in the regression tests. This change is necessary in the current version because the pg_stat_statements_reset() function will return a timestamp of a reset, needed for sampling solutions to detect resets, perfermed by someone else. Regards -- Andrei Zubkov
Attachment
Hi, The final version of this patch should fix meson build and tests. -- Andrei Zubkov
Attachment
Hi! I've attached a new version of a patch - rebase to the current master. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
On Wed, 1 Mar 2023 at 04:04, Andrei Zubkov <zubkov@moonset.ru> wrote: > > Hi! > > I've attached a new version of a patch - rebase to the current master. The CFBot (http://cfbot.cputube.org/) doesn't seem to like this. It looks like all the Meson builds are failing, perhaps there's something particular about the test environment that is either not set up right or is exposing a bug? Please check if this is a real failure or a cfbot failure. -- Gregory Stark As Commitfest Manager
Hi Gregory, On Wed, 2023-03-01 at 14:24 -0500, Gregory Stark (as CFM) wrote: > The CFBot (http://cfbot.cputube.org/) doesn't seem to like this. It > looks like all the Meson builds are failing, perhaps there's > something > particular about the test environment that is either not set up right > or is exposing a bug? Thank you, I've missed it. > > Please check if this is a real failure or a cfbot failure. > It is my failure. Just forgot to update meson.build I think CFBot should be happy now. Regards, -- Andrei Zubkov
Attachment
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
I'm sorry, It seems this is failing again? It's Makefile and meson.build again :( Though there are also patching file contrib/pg_stat_statements/sql/oldextversions.sql can't find file to patch at input line 1833 and |--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql |+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql -------------------------- No file to patch. Skipping patch. -- Gregory Stark As Commitfest Manager
Hi Gregory, > patching file contrib/pg_stat_statements/sql/oldextversions.sql > can't find file to patch at input line 1833 > > > and > > > --- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql > > +++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql > -------------------------- > No file to patch. Skipping patch. > Thank you for your attention. Yes, it is due to parallel work on "Normalization of utility queries in pg_stat_statements" patch (https://postgr.es/m/Y/7Y9U/y/keAW3qH@paquier.xyz) It seems I've found something strange in new test files - I've mentioned this in a thread of a patch. Once there will be any solution I'll do a rebase again as soon as possible. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi, I've done a rebase of a patch to the current master. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Sat, Mar 11, 2023 at 02:49:50PM +0300, Andrei Zubkov wrote: > Hi, > > I've done a rebase of a patch to the current master. +/* First we have to remove them from the extension */ +ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements; +ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean); +ALTER EXTENSION pg_stat_statements DROP FUNCTION + pg_stat_statements_reset(Oid, Oid, bigint); The upgrade script of an extension is launched by the backend in the context of an extension, so these three queries should not be needed, as far as I recall. -SELECT pg_stat_statements_reset(); - pg_stat_statements_reset --------------------------- - +SELECT pg_stat_statements_reset() IS NOT NULL AS t; + t +--- + t (1 row) Wouldn't it be better to do this kind of refactoring in its own patch to make the follow-up changes more readable? This function is changed to return a timestamp rather than void, but IS NOT NULL applied on the existing queries would also return true. This would clean up quite a few diffs in the main patch.. -- Michael
Attachment
Hi Michael, Thank you for your attention. On Thu, 2023-03-16 at 16:13 +0900, Michael Paquier wrote: > +/* First we have to remove them from the extension */ > +ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements; > +ALTER EXTENSION pg_stat_statements DROP FUNCTION > pg_stat_statements(boolean); > +ALTER EXTENSION pg_stat_statements DROP FUNCTION > + pg_stat_statements_reset(Oid, Oid, bigint); > > The upgrade script of an extension is launched by the backend in the > context of an extension, so these three queries should not be needed, > as far as I recall. Agreed. I've done it as it was in previous versions. But I'm sure those are unnecessary. > Wouldn't it be better to do this kind of refactoring in its own patch > to make the follow-up changes more readable? This function is > changed > to return a timestamp rather than void, but IS NOT NULL applied on > the > existing queries would also return true. This would clean up quite a > few diffs in the main patch.. Splitting this commit seems reasonable to me. New version is attached.
Attachment
A little comment fix in update script of a patch -- Andrei Zubkov
Attachment
Hello! The documentation still describes the function pg_stat_statements_reset like this > By default, this function can only be executed by superusers. But unfortunately, this part was lost somewhere. -- Don't want this to be available to non-superusers. REVOKE ALL ON FUNCTION pg_stat_statements_reset(Oid, Oid, bigint, boolean) FROM PUBLIC; should be added to the upgrade script Also, shouldn't we first do: /* First we have to remove them from the extension */ ALTER EXTENSION pg_stat_statements DROP VIEW .. ALTER EXTENSION pg_stat_statements DROP FUNCTION .. like in previous upgrade scripts? > + Time at which min/max statistics gathering started for this > + statement I think it would be better to explicitly mention in the documentation all 4 fields for which minmax_stats_since displaysthe time. regards, Sergei
Hi Sergei! Thank you for your review. On Tue, 2023-03-21 at 23:18 +0300, Sergei Kornilov wrote: > -- Don't want this to be available to non-superusers. > REVOKE ALL ON FUNCTION pg_stat_statements_reset(Oid, Oid, bigint, > boolean) FROM PUBLIC; > > should be added to the upgrade script Indeed. > Also, shouldn't we first do: > > /* First we have to remove them from the extension */ > ALTER EXTENSION pg_stat_statements DROP VIEW .. > ALTER EXTENSION pg_stat_statements DROP FUNCTION .. > > like in previous upgrade scripts? It was discussed few messages earlier in this thread. We've decided that those are unnecessary in upgrade script. > > + Time at which min/max statistics gathering started for this > > + statement > > I think it would be better to explicitly mention in the documentation > all 4 fields for which minmax_stats_since displays the time. Agreed. New version is attached. regards, Andrei
Attachment
> On 22 Mar 2023, at 09:17, Andrei Zubkov <zubkov@moonset.ru> wrote: > New version is attached. This patch is marked RfC but didn't get reviewed/committed during this CF so I'm moving it to the next, the patch no longer applies though so please submit an updated version. -- Daniel Gustafsson
Hi hackers, New version 23 attached. It contains rebase to the current master. Noted that v1.11 adds new fields to the pg_stat_sstatements view, but leaves the PGSS_FILE_HEADER constant unchanged. It this correct? -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi, During last moving to the current commitfest this patch have lost its reviewers list. With respect to reviewers contribution in this patch, I think reviewers list should be fixed. regards, Andrei Zubkov Postgres Professional The Russian Postgres Company
On 24.10.23 09:58, Andrei Zubkov wrote: > During last moving to the current commitfest this patch have lost its > reviewers list. With respect to reviewers contribution in this patch, I > think reviewers list should be fixed. I don't think it's the purpose of the commitfest app to track how *has* reviewed a patch. The purpose is to plan and allocate *current* work. If we keep a bunch of reviewers listed on a patch who are not actually reviewing the patch, then that effectively blocks new reviewers from signing up and the patch will not make progress. Past reviewers should of course be listed in the commit message, the release notes, etc. as appropriate.
On Tue, Oct 24, 2023 at 6:57 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 24.10.23 09:58, Andrei Zubkov wrote: > > During last moving to the current commitfest this patch have lost its > > reviewers list. With respect to reviewers contribution in this patch, I > > think reviewers list should be fixed. > > I don't think it's the purpose of the commitfest app to track how *has* > reviewed a patch. The purpose is to plan and allocate *current* work. > If we keep a bunch of reviewers listed on a patch who are not actually > reviewing the patch, then that effectively blocks new reviewers from > signing up and the patch will not make progress. > > Past reviewers should of course be listed in the commit message, the > release notes, etc. as appropriate. Really? Last time this topic showed up at least one committer said that they tend to believe the CF app more than digging the thread [1], and some other hackers mentioned other usage for being kept in the reviewer list. Since no progress has been made on the CF app since I'm not sure it's the best idea to drop reviewer names from patch entries generally. [1] https://www.postgresql.org/message-id/552155.1648737431@sss.pgh.pa.us
On 24.10.23 14:40, Julien Rouhaud wrote: > On Tue, Oct 24, 2023 at 6:57 PM Peter Eisentraut <peter@eisentraut.org> wrote: >> >> On 24.10.23 09:58, Andrei Zubkov wrote: >>> During last moving to the current commitfest this patch have lost its >>> reviewers list. With respect to reviewers contribution in this patch, I >>> think reviewers list should be fixed. >> >> I don't think it's the purpose of the commitfest app to track how *has* >> reviewed a patch. The purpose is to plan and allocate *current* work. >> If we keep a bunch of reviewers listed on a patch who are not actually >> reviewing the patch, then that effectively blocks new reviewers from >> signing up and the patch will not make progress. >> >> Past reviewers should of course be listed in the commit message, the >> release notes, etc. as appropriate. > > Really? Last time this topic showed up at least one committer said > that they tend to believe the CF app more than digging the thread [1], > and some other hackers mentioned other usage for being kept in the > reviewer list. Since no progress has been made on the CF app since > I'm not sure it's the best idea to drop reviewer names from patch > entries generally. There is a conflict between the two purposes. But it is clearly the case that reviewers will more likely pick up patches that have no reviewers assigned. So if you keep stale reviewer entries around, then a patch that stays around for a while will never get reviewed again. I think this is a significant problem at the moment, and I made it part of my mission during the last commitfest to clean it up. If people want to put the stale reviewer entries back in, that is possible, but I would caution against that, because that would just self-sabotage those patches.
On 19/10/2023 19:40, Andrei Zubkov wrote: > Hi hackers, > > New version 23 attached. It contains rebase to the current master. I discovered the patch and parameters you've proposed. In my opinion, the stats_since parameter adds valuable information and should definitely be included in the stats data because the statement can be noteless removed from the list and inserted again. But minmax_stats_since and changes in the UI of the reset routine look like syntactic sugar here. I can't convince myself that it is really needed. Do you have some set of cases that can enforce the changes proposed? Maybe we should intensively work on adding the 'stats_since' parameter and continue the discussion of the necessity of another one? -- regards, Andrei Lepikhov Postgres Professional
Hi Andrei, On Wed, 2023-10-25 at 13:59 +0700, Andrei Lepikhov wrote: > But minmax_stats_since and changes in the UI of the reset routine > look like syntactic sugar here. > I can't convince myself that it is really needed. Do you have some > set of cases that can enforce the changes proposed? Yes, it looks strange, but it is needed in some way. The main purpose of this patch is to provide means for sampling solutions for collecting statistics data in series of samples. The first goal here - is to be sure that the statement was not evicted and come back between samples (especially between rare samples). This is the target of the stats_since field. The second goal - is the ability of getting all statistic increments for the interval between samples. All statistics provided by pg_stat_statements with except of min/max values can be calculated for the interval since the last sample knowing the values in the last and current samples. We need a way to get min/max values too. This is achieved by min/max reset performed by the sampling solution. The minmax_stats_since field is here for the same purpose - the sampling solution is need to be sure that no one have done a min/max reset between samples. And if such reset was performed at least we know when it was performed. regards, Andrei Zubkov Postgres Professional
Hi! Thank you for your work on the subject.Hi hackers, New version 23 attached. It contains rebase to the current master. Noted that v1.11 adds new fields to the pg_stat_sstatements view, but leaves the PGSS_FILE_HEADER constant unchanged. It this correct?
1. I didn't understand why we first try to find pgssEntry with a false top_level value, and later find it with a true top_level value./*
* Remove the entry if it exists, starting with the non-top-level entry.
*/
key.toplevel = false;
entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_FIND, NULL);
SINGLE_ENTRY_RESET(entry);
/* Also reset the top-level entry if it exists. */
key.toplevel = true;
entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_FIND, NULL);
SINGLE_ENTRY_RESET(entry);
I looked through this topic and found some explanation in this email [0], but I didn't understand it. Can you explain it to me?
2. And honestly, I think you should change
"Remove the entry if it exists, starting with the non-top-level entry." on
"Remove the entry or reset min/max time statistic information and the timestamp if it exists, starting with the non-top-level entry."
And the same with the comment bellow:
"Also reset the top-level entry if it exists."
"Also remove the entry or reset min/max time statistic information and the timestamp if it exists." In my opinion, this is necessary because the minmax_only parameter is set by the user, so both ways are possible.
0 - https://www.postgresql.org/message-id/62d16845-e74e-a6f9-9661-022e44f48922%40inbox.ru
-- Regards, Alena Rybakina
Hi Alena, On Wed, 2023-10-25 at 16:25 +0300, Alena Rybakina wrote: > Hi! Thank you for your work on the subject. > 1. I didn't understand why we first try to find pgssEntry with a > false top_level value, and later find it with a true top_level value. In case of pg_stat_statements the top_level field is the bool field of the pgssHashKey struct used as the key for pgss_hash hashtable. When we are performing a reset only userid, dbid and queryid values are provided. We need to reset both top-level and non-top level entries. We have only one way to get them all from a hashtable - search for entries having top_level=true and with top_level=false in their keys. > 2. And honestly, I think you should change > "Remove the entry if it exists, starting with the non-top-level > entry." on > "Remove the entry or reset min/max time statistic information and > the timestamp if it exists, starting with the non-top-level entry." > And the same with the comment bellow: > "Also reset the top-level entry if it exists." > "Also remove the entry or reset min/max time statistic information > and the timestamp if it exists." There are four such places actually - every time when the SINGLE_ENTRY_RESET macro is used. The logic of reset performed is described a bit in this macro definition. It seems quite redundant to repeat this description four times. But I've noted that "remove" terms should be replaced by "reset". I've attached v24 with commits updated. regards, Andrei Zubkov Postgres Professional
Attachment
On 25/10/2023 20:00, Andrei Zubkov wrote: > Hi Andrei, > > On Wed, 2023-10-25 at 13:59 +0700, Andrei Lepikhov wrote: >> But minmax_stats_since and changes in the UI of the reset routine >> look like syntactic sugar here. >> I can't convince myself that it is really needed. Do you have some >> set of cases that can enforce the changes proposed? > > Yes, it looks strange, but it is needed in some way. > The main purpose of this patch is to provide means for sampling > solutions for collecting statistics data in series of samples. The > first goal here - is to be sure that the statement was not evicted and > come back between samples (especially between rare samples). This is > the target of the stats_since field. The second goal - is the ability > of getting all statistic increments for the interval between samples. > All statistics provided by pg_stat_statements with except of min/max > values can be calculated for the interval since the last sample knowing > the values in the last and current samples. We need a way to get > min/max values too. This is achieved by min/max reset performed by the > sampling solution. The minmax_stats_since field is here for the same > purpose - the sampling solution is need to be sure that no one have > done a min/max reset between samples. And if such reset was performed > at least we know when it was performed. It is the gist of my question. If needed, You can remove the record by (userid, dbOid, queryId). As I understand, this extension is usually used by an administrator. Who can reset these parameters except you and the DBMS? -- regards, Andrei Lepikhov Postgres Professional
On Thu, 2023-10-26 at 15:49 +0700, Andrei Lepikhov wrote: > It is the gist of my question. If needed, You can remove the record > by > (userid, dbOid, queryId). As I understand, this extension is usually > used by an administrator. Who can reset these parameters except you > and > the DBMS? This extension is used by administrator but indirectly through some kind of sampling solution providing information about statistics change over time. The only kind of statistics unavailable to sampling solutions without a periodic reset is a min/max statistics. This patch provides a way for resetting those statistics without entry eviction. Suppose the DBA will use several sampling solutions. Every such solution can perform its own resets of min/max statistics. Other sampling solutions need a way to detect such resets to avoid undetected interference. Timestamping of min/max reset can be used for that purpose. -- regards, Andrei Zubkov Postgres Professional
Thank you for explanation, I got it.Hi Alena, On Wed, 2023-10-25 at 16:25 +0300, Alena Rybakina wrote:Hi! Thank you for your work on the subject. 1. I didn't understand why we first try to find pgssEntry with a false top_level value, and later find it with a true top_level value.In case of pg_stat_statements the top_level field is the bool field of the pgssHashKey struct used as the key for pgss_hash hashtable. When we are performing a reset only userid, dbid and queryid values are provided. We need to reset both top-level and non-top level entries. We have only one way to get them all from a hashtable - search for entries having top_level=true and with top_level=false in their keys.
Yes, I agree with the changes.2. And honestly, I think you should change "Remove the entry if it exists, starting with the non-top-level entry." on "Remove the entry or reset min/max time statistic information and the timestamp if it exists, starting with the non-top-level entry." And the same with the comment bellow: "Also reset the top-level entry if it exists." "Also remove the entry or reset min/max time statistic information and the timestamp if it exists."There are four such places actually - every time when the SINGLE_ENTRY_RESET macro is used. The logic of reset performed is described a bit in this macro definition. It seems quite redundant to repeat this description four times. But I've noted that "remove" terms should be replaced by "reset". I've attached v24 with commits updated.
-- Regards, Alena Rybakina
Hi hackers, Patch rebased to the current master -- regards, Andrei Zubkov Postgres Professional
Attachment
A little fix in "level_tracking" tests after merge. -- regards, Andrei Zubkov Postgres Professional
Attachment
Hi! On Fri, Nov 17, 2023 at 10:40 AM Andrei Zubkov <zubkov@moonset.ru> wrote: > > A little fix in "level_tracking" tests after merge. I've reviewed this patch. I think this is the feature of high demand. New columns (stats_since and minmax_stats_since) to the pg_stat_statements view, enhancing the granularity and precision of performance monitoring. This addition allows database administrators to have a clearer understanding of the time intervals for statistics collection on each statement. Such detailed tracking is invaluable for performance tuning and identifying bottlenecks in database operations. I think the design was well-discussed in this thread. Implementation also looks good to me. I've just slightly revised the commit messages. I'd going to push this patchset if no objections. ------ Regards, Alexander Korotkov
Attachment
Hi, On Sat, Nov 25, 2023 at 02:45:07AM +0200, Alexander Korotkov wrote: > > I've reviewed this patch. I think this is the feature of high demand. > New columns (stats_since and minmax_stats_since) to the > pg_stat_statements view, enhancing the granularity and precision of > performance monitoring. This addition allows database administrators > to have a clearer understanding of the time intervals for statistics > collection on each statement. Such detailed tracking is invaluable for > performance tuning and identifying bottlenecks in database operations. Yes, it will greatly improve performance analysis tools, and as the maintainer of one of them I've been waiting for this feature for a very long time! > > I think the design was well-discussed in this thread. Implementation > also looks good to me. I've just slightly revised the commit > messages. > > I'd going to push this patchset if no objections. Thanks! No objection from me, it all looks good.
Hi Alexander! On Sat, 2023-11-25 at 02:45 +0200, Alexander Korotkov wrote: > I've reviewed this patch. Thank you very much for your review. > I think the design was well-discussed in this thread. Implementation > also looks good to me. I've just slightly revised the commit > messages. I've noted a strange space in a commit message of 0001 patch: "I t is needed for upcoming patch..." It looks like a typo. -- regards, Andrei Zubkov Postgres Professional
On Sat, Nov 25, 2023 at 10:45 PM Andrei Zubkov <zubkov@moonset.ru> wrote: > On Sat, 2023-11-25 at 02:45 +0200, Alexander Korotkov wrote: > > > I've reviewed this patch. > > Thank you very much for your review. > > > I think the design was well-discussed in this thread. Implementation > > also looks good to me. I've just slightly revised the commit > > messages. > > I've noted a strange space in a commit message of 0001 patch: > "I t is needed for upcoming patch..." > It looks like a typo. Thank you for catching it. I'll fix this before commit. ------ Regards, Alexander Korotkov
Hello Alexander, 25.11.2023 23:46, Alexander Korotkov wrote: > On Sat, Nov 25, 2023 at 10:45 PM Andrei Zubkov<zubkov@moonset.ru> wrote: >> >> I've noted a strange space in a commit message of 0001 patch: >> "I t is needed for upcoming patch..." >> It looks like a typo. > Thank you for catching it. I'll fix this before commit. I've found one more typo in that commit: minimun. Best regards, Alexander