Hi Lukas,
Thanks for taking care of incorporating the latest patch feedback.
On 13.02.2026 05:11, Lukas Fittl wrote:
> On Thu, Feb 12, 2026 at 4:41 PM Andres Freund <andres@anarazel.de> wrote:
>> On 2026-02-12 08:05:27 -0800, Lukas Fittl wrote:
> (1) changing the pg_ticks_to_ns logic to have an explicit
> "ticks_per_ns_scaled == 0" early check and return at the start, and
> setting ticks_per_ns_scaled to 0 when clock_gettime() gets used. This
> is similar to what David already suggested in an earlier email.
> (2) using uint64 for the ticks_per_ns_scaled/max_ticks_no_overflow
> variables - this appears to help GCC generate a bit shift reliably,
> instead of an idiv instruction.
>
> That appears to eliminate the regression in my testing. Attached an
> updated v7, which also has some slightly improved commit messages.
>
> Additional comparisons with the test case you had back at the start of
> this thread, with system clock source on my test VM:
>
> master:
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1888.891 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 23.53 ns
>
> v6 (0002 + pg_test_timing prev/cur change):
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1897.095 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 25.52 ns
>
> v7 (0002):
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1897.148 ms (best of 3)
> Average loop time including overhead: 23.14 ns
Shouldn't that result be better than master because you optimized the
loop overhead in v7-0002? That's at least what I've measured, see test
results below.
> And when looking at the TSC time source with the full patch set on the same VM:
>
> v6:
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1477.672 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 11.79 ns
>
> v7:
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1476.326 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 11.78 ns
>
> Thanks,
> Lukas
>
> [0]: https://godbolt.org/z/EvK1M66n5
>
> --
> Lukas Fittl
The code wasn't compiling properly on Windows because __x86_64__ is not
defined in Visual C++. I've changed the code to use
#if defined(__x86_64__) || defined(_M_X64)
I've also changed #include <x86intrin.h> to <immintrin.h>.
I've tested v8 of the patch (= v7 plus aforementioned changes) on
Windows. I'm reporting the best of 3 runs.
lotsarows test with parallelism disabled:
master: 2781 ms
v7: 2776 ms (timing_clock_source = 'system')
v7: 2091 ms (timing_clock_source = 'tsc')
pg_test_timing:
master: 27.04 ns
v7: 16.59 ns (QueryxPerformanceCounter)
v7: 13.69 ns (RDTSCP)
v7: 9.42 ns (RDTSC)
v8 of the patch is attached to this mail.
--
David Geier