RE: stats.sql might fail due to shared buffers also used by parallel tests - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: stats.sql might fail due to shared buffers also used by parallel tests |
Date | |
Msg-id | OSCPR01MB14966BDD12141F158687AB1BCF557A@OSCPR01MB14966.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: stats.sql might fail due to shared buffers also used by parallel tests (Alexander Lakhin <exclusion@gmail.com>) |
List | pgsql-hackers |
Dear Alexander, > > So according to me, I suspect the following causes > > 1) The time difference between 'prev_stats_reset' and current > > 'stats_reset' value is less than 1 microseconds. > > 'stats_reset' is of type 'timestamp with time zone' and the content of > > it is like: '2025-06-30 21:01:07.925253+05:30'. So if the time > > difference between 'prev_stats_reset' and current 'stats_reset' is > > less than 1 microseconds. The query 'SELECT :'prev_stats_reset' < > > stats_reset FROM pg_stat_subscription_stats WHERE subname = > > 'regress_testsub'' might return 'false' instead of 'true'. > > But I was not able to reproduce such a scenario after multiple > > testing. Even in high end machines, it takes at least a few > > microseconds. Also there are multiple cases where we did similar > > timestamp comparison in 'stats.sql' as well. And, I didn't find any > > other failure related to such case. So, I feel this is not possible. > > Did you try that on Windows (hamerkop is a Windows animal)? IIUC, > GetCurrentTimestamp() -> gettimeofday() implemented on Windows via > GetSystemTimePreciseAsFileTime(), and it has 100ns resolution, Hmm. I'm not familiar with the Windows environment, but I have the doubt for it. GetSystemTimePreciseAsFileTime() returns FILETIME structure, which represents the time UTC with 100-nanosecod intervals [1]. The stack overflow seemed to refer it. However, the document for GetSystemTimePreciseAsFileTime() says that the resolution is < 1 us [2]. Also, MS doc [3] does not say that GetSystemTimePreciseAsFileTime() returns value monotonically. Another API QueryPerformanceCounter() seems to have the monotony. A bit old document [4] also raised the possibility: ``` Consecutive calls may return the same result. The call time is less than the smallest increment of the system time. The granularity is in the sub-microsecond regime. The function may be used for time measurements but some care has to be taken: Time differences may be ZERO. ``` Also, what if the the system clock is modified during the test via NTP? > > 2) pg_stat_reset_subscription_stats(oid) function did not reset the stats. > > We have a shared hash 'pgStatLocal.shared_hash'. If the entry > > reference (for the subscription) is not found while executing > > 'pg_stat_reset_subscription_stats(oid)'. It may not be able to reset > > the stats. Maybe somehow this shared hash is getting dropped.. > > Also, it could be failing due to the same reason as Alexander has > > I don't think 2) is relevant here, because shared buffers shouldn't affect > subscription's statistics. To confirm; we do not consider the possibility that pgstat_get_entry_ref() returns NULL right? [1]: https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-filetime [2]: https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsystemtimepreciseasfiletime [3]: https://learn.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps [4]: http://www.windowstimestamp.com/description#:~:text=2.1.4.2.%C2%A0%C2%A0Desktop%20Applications%3A%20GetSystemTimePreciseAsFileTime() Best regards, Hayato Kuroda FUJITSU LIMITED
pgsql-hackers by date: