Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? - Mailing list pgsql-hackers

From David Geier
Subject Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date
Msg-id 41528b05-62be-4a5a-abd8-2af2dd84a1be@gmail.com
Whole thread Raw
In response to Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?  (Lukas Fittl <lukas@fittl.com>)
Responses Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
List pgsql-hackers
Hi Lukas,

Thanks for taking care of incorporating the latest patch feedback.

On 13.02.2026 05:11, Lukas Fittl wrote:
> On Thu, Feb 12, 2026 at 4:41 PM Andres Freund <andres@anarazel.de> wrote:
>> On 2026-02-12 08:05:27 -0800, Lukas Fittl wrote:
> (1) changing the pg_ticks_to_ns logic to have an explicit
> "ticks_per_ns_scaled == 0" early check and return at the start, and
> setting ticks_per_ns_scaled to 0 when clock_gettime() gets used. This
> is similar to what David already suggested in an earlier email.
> (2) using uint64 for the ticks_per_ns_scaled/max_ticks_no_overflow
> variables - this appears to help GCC generate a bit shift reliably,
> instead of an idiv instruction.
> 
> That appears to eliminate the regression in my testing. Attached an
> updated v7, which also has some slightly improved commit messages.
> 
> Additional comparisons with the test case you had back at the start of
> this thread, with system clock source on my test VM:
> 
> master:
> 
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1888.891 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 23.53 ns
> 
> v6 (0002 + pg_test_timing prev/cur change):
> 
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1897.095 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 25.52 ns
> 
> v7 (0002):
> 
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1897.148 ms (best of 3)
> Average loop time including overhead: 23.14 ns

Shouldn't that result be better than master because you optimized the
loop overhead in v7-0002? That's at least what I've measured, see test
results below.

> And when looking at the TSC time source with the full patch set on the same VM:
> 
> v6:
> 
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1477.672 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 11.79 ns
> 
> v7:
> 
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1476.326 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 11.78 ns
> 
> Thanks,
> Lukas
> 
> [0]: https://godbolt.org/z/EvK1M66n5
> 
> --
> Lukas Fittl

The code wasn't compiling properly on Windows because __x86_64__ is not
defined in Visual C++. I've changed the code to use

  #if defined(__x86_64__) || defined(_M_X64)

I've also changed #include <x86intrin.h> to <immintrin.h>.


I've tested v8 of the patch (= v7 plus aforementioned changes) on
Windows. I'm reporting the best of 3 runs.

lotsarows test with parallelism disabled:

master: 2781 ms
v7:     2776 ms (timing_clock_source = 'system')
v7:     2091 ms (timing_clock_source = 'tsc')

pg_test_timing:

master: 27.04 ns
v7:     16.59 ns (QueryxPerformanceCounter)
v7:     13.69 ns (RDTSCP)
v7:      9.42 ns (RDTSC)


v8 of the patch is attached to this mail.

--
David Geier
Attachment

pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Check for memset_explicit() and explicit_memset()
Next
From: Dmitry Dolgov
Date:
Subject: Re: Add ssl_(supported|shared)_groups to sslinfo