Thread: pgsql: pgstat: add/extend tests for resetting various kinds of stats.
pgstat: add/extend tests for resetting various kinds of stats. - subscriber stats reset path was untested - slot stat sreset path for all slots was untested - pg_stat_database.sessions etc was untested - pg_stat_reset_shared() was untested, for any kind of shared stats - pg_stat_reset() was untested Author: Melanie Plageman <melanieplageman@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/5264add7847871d61d36a5770dac2139d6a7bc80 Modified Files -------------- contrib/test_decoding/expected/stats.out | 117 +++++++--- contrib/test_decoding/sql/stats.sql | 32 ++- src/test/recovery/t/006_logical_decoding.pl | 63 ++++++ src/test/regress/expected/stats.out | 164 ++++++++++++++ src/test/regress/parallel_schedule | 3 + src/test/regress/sql/stats.sql | 78 +++++++ src/test/subscription/t/026_stats.pl | 318 +++++++++++++++++++++------- 7 files changed, 664 insertions(+), 111 deletions(-)
Andres Freund <andres@anarazel.de> writes: > pgstat: add/extend tests for resetting various kinds of stats. crake just failed [1] with what appears to be a race condition in a test case added by this commit: diff -U3 /home/andrew/bf/root/HEAD/pgsql/src/test/regress/expected/stats.out /home/andrew/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/results/stats.out --- /home/andrew/bf/root/HEAD/pgsql/src/test/regress/expected/stats.out 2022-07-18 13:32:25.155847949 -0400 +++ /home/andrew/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/results/stats.out 2022-09-15 12:44:10.022975164-0400 @@ -563,7 +563,7 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELECT current_database()); ?column? ---------- - t + f (1 row) -- Test pg_stat_bgwriter checkpointer-related stats, together with pg_stat_wal It seems likely to me that the n_sessions increment hasn't made it to shared memory because some background process happened to have a lock on the stats entry when we tried. Don't we need a pg_stat_force_next_flush() call before trying to inspect the sessions count? BTW, the header comment for pgstat_report_stat is badly in need of copy-editing. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2022-09-15%2016%3A18%3A30
Hi, On 2022-09-15 15:59:52 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > pgstat: add/extend tests for resetting various kinds of stats. > > crake just failed [1] with what appears to be a race condition in > a test case added by this commit: > > diff -U3 /home/andrew/bf/root/HEAD/pgsql/src/test/regress/expected/stats.out /home/andrew/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/results/stats.out > --- /home/andrew/bf/root/HEAD/pgsql/src/test/regress/expected/stats.out 2022-07-18 13:32:25.155847949 -0400 > +++ /home/andrew/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/results/stats.out 2022-09-15 12:44:10.022975164-0400 > @@ -563,7 +563,7 @@ > SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELECT current_database()); > ?column? > ---------- > - t > + f > (1 row) > > -- Test pg_stat_bgwriter checkpointer-related stats, together with pg_stat_wal > > It seems likely to me that the n_sessions increment hasn't made it to shared > memory because some background process happened to have a lock on the stats > entry when we tried. Don't we need a pg_stat_force_next_flush() call before > trying to inspect the sessions count? Indeed. Pushed a fix. To see if there are other omissions, I made pgstat_report_stat() never flush stats unless force is passed in. No other failures. > BTW, the header comment for pgstat_report_stat is badly in need of > copy-editing. Hm - you're talking about this sentence? ... Callers can use the returned time to set up * a timeout after which to call pgstat_report_stat(true), but are not * required to to do so. Reads awkward. I'm struggling with the english language trying to improve it :(. "Callers can, but are not required to, use the returned time to set up a timer, whose expiration should trigger a call to pgstat_report_stat(true)."? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2022-09-15 15:59:52 -0400, Tom Lane wrote: >> It seems likely to me that the n_sessions increment hasn't made it to shared >> memory because some background process happened to have a lock on the stats >> entry when we tried. Don't we need a pg_stat_force_next_flush() call before >> trying to inspect the sessions count? > Indeed. Pushed a fix. To see if there are other omissions, I made > pgstat_report_stat() never flush stats unless force is passed in. No other > failures. Cool, thanks for dealing with that. >> BTW, the header comment for pgstat_report_stat is badly in need of >> copy-editing. > Hm - you're talking about this sentence? /* * Must be called by processes that performs DML: tcop/postgres.c, logical ^^^^^^^^^^^^^^^^^^^^^^^ Plural agreement problem. "all processes that perform" or "each process that performs" would be OK. * receiver processes, SPI worker, etc. to flush pending statistics updates to * shared memory. * * Unless called with 'force', pending stats updates are flushed happen once ^^^^^^^^^^^^^^^^^^ One verb too many. * per PGSTAT_MIN_INTERVAL (1000ms). When not forced, stats flushes do not * block on lock acquisition, except if stats updates have been pending for * longer than PGSTAT_MAX_INTERVAL (60000ms). * * Whenever pending stats updates remain at the end of pgstat_report_stat() a * suggested idle timeout is returned. Currently this is always * PGSTAT_IDLE_INTERVAL (10000ms). Callers can use the returned time to set up * a timeout after which to call pgstat_report_stat(true), but are not * required to to do so. This isn't grammatically bad, but it fails to explain how callers are to know whether they need to do that. It looks like the convention is to return 0 if no updates remain, but you have to read the code and guess. * * Note that this is called only when not within a transaction, so it is fair * to use transaction stop time as an approximation of current time. */ This should be inside the function, it's not part of the API spec nor of any concern to callers. Or maybe the API spec should say "This must be called immediately after finishing a transaction", and then inside the function say "We assume GetCurrentTransactionStopTimestamp gives a close-enough approximation of current time." regards, tom lane
I wrote: >> * required to to do so. > This isn't grammatically bad, ... um, other than the two "to"s. But the semantic omission is a bigger deal. regards, tom lane