Re: stats.sql might fail due to shared buffers also used by parallel tests - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: stats.sql might fail due to shared buffers also used by parallel tests |
Date | |
Msg-id | e05868e2-19b2-4cf1-8299-6ac406035eee@gmail.com Whole thread Raw |
In response to | RE: stats.sql might fail due to shared buffers also used by parallel tests ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
List | pgsql-hackers |
Hello Kuroda-san,
Thank you for your attention to this!
15.07.2025 10:33, Hayato Kuroda (Fujitsu) wrote:
Thank you for your attention to this!
15.07.2025 10:33, Hayato Kuroda (Fujitsu) wrote:
GetSystemTimePreciseAsFileTime() returns FILETIME structure, which represents the time UTC with 100-nanosecod intervals [1]. The stack overflow seemed to refer it. However, the document for GetSystemTimePreciseAsFileTime() says that the resolution is < 1 us [2]. Also, MS doc [3] does not say that GetSystemTimePreciseAsFileTime() returns value monotonically. Another API QueryPerformanceCounter() seems to have the monotony. A bit old document [4] also raised the possibility: ``` Consecutive calls may return the same result. The call time is less than the smallest increment of the system time. The granularity is in the sub-microsecond regime. The function may be used for time measurements but some care has to be taken: Time differences may be ZERO. ``` Also, what if the the system clock is modified during the test via NTP?
Yeah, I made a simple test for GetSystemTimePreciseAsFileTime() and
confirmed that in my VM it provides sub-microsecond precision. Regarding
NTP, I think the second failure of this ilk [1] makes this cause close to
impossible. (Can't wait for the third one to gather more information.)
2) pg_stat_reset_subscription_stats(oid) function did not reset the stats. We have a shared hash 'pgStatLocal.shared_hash'. If the entry reference (for the subscription) is not found while executing 'pg_stat_reset_subscription_stats(oid)'. It may not be able to reset the stats. Maybe somehow this shared hash is getting dropped.. Also, it could be failing due to the same reason as Alexander hasI don't think 2) is relevant here, because shared buffers shouldn't affect subscription's statistics.To confirm; we do not consider the possibility that pgstat_get_entry_ref() returns NULL right?
I've held a simple experiment with a modification like this:
@@ -1078,6 +1078,7 @@ pgstat_reset_entry(PgStat_Kind kind, Oid dboid, uint64 objid, TimestampTz ts)
Assert(!pgstat_get_kind_info(kind)->fixed_amount);
entry_ref = pgstat_get_entry_ref(kind, dboid, objid, false, NULL);
+if (rand() % 3 == 0) entry_ref = NULL;
if (!entry_ref || entry_ref->shared_entry->dropped)
and got several failures like:
--- .../postgresql/src/test/regress/expected/subscription.out 2025-04-25 10:27:32.851554400 -0700
+++ .../postgresql/build/testrun/regress/regress/results/subscription.out 2025-07-20 00:05:05.667903300 -0700
@@ -56,7 +56,7 @@
SELECT subname, stats_reset IS NULL stats_reset_is_null FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
subname | stats_reset_is_null
-----------------+---------------------
- regress_testsub | f
+ regress_testsub | t
(1 row)
-- Reset the stats again and check if the new reset_stats is updated.
@@ -68,11 +68,9 @@
(1 row)
SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
- ?column?
-----------
- t
-(1 row)
-
+ERROR: syntax error at or near ":"
+LINE 1: SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscr...
+
--- .../postgresql/src/test/regress/expected/stats.out 2025-04-25 10:27:36.930322500 -0700
+++ .../postgresql/build/testrun/regress/regress/results/stats.out 2025-07-20 00:05:19.579864900 -0700
@@ -1720,7 +1720,7 @@
SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
?column?
----------
- t
+ f
(1 row)
...
Thus, if there is some issue with pgstat_get_entry_ref(), then it should
be specific to subscriptions and come out in that place only (given the
information we have now).
So I still suspect some Windows/concrete animal's peculiarity.
Nagata-san, could you please share the configuration of hamerkop? If it's
running inside VM, what virtualization software is used?
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop&dt=2025-07-09%2011%3A02%3A23
Best regards.
Alexander
pgsql-hackers by date: