Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? - Mailing list pgsql-hackers

From Lukas Fittl
Subject Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date
Msg-id CAP53PkyooCeR8YV0BUD_xC7oTZESHz8OdA=tP7pBRHFVQ9xtKg@mail.gmail.com
Whole thread Raw
In response to Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?  (David Geier <geidav.pg@gmail.com>)
Responses Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
List pgsql-hackers
On Sun, Jan 11, 2026 at 11:26 AM David Geier <geidav.pg@gmail.com> wrote:
>
> > Based on Robert's suggestion I wanted to add a "fast_clock_source" enum
> > GUC which can have the following values "auto", "rdtsc", "try_rdtsc" and
> > "off". With that, at least no additional checks are needed and
> > performance will remain as previously benchmarked in this thread.
>
> The attached patch set is rebased on latest master and contains a commit
> which adds a "fast_clock_source" GUC that can be "try", "off" and
> "rdtsc" on Linux.
>
> Alternatively, we could call the GUC "clock_source" with "auto",
> "clock_gettime" and "rdtsc". Opinions?

No strong opinion on the GUC name ("fast_clock_source" seems fine?),
but I think "try" is a bit confusing if our logic is more than just
checking if the RDTSC(P) instruction is available, so I'd be in favor
of "auto" as the default value.

> I moved the call to INSTR_TIME_INITIALIZE() from InitPostgres() to
> PostmasterMain(). In InitPostgres() it kept the database in a recovery
> cycle.

I think we can actually avoid having anything in PostmasterMain (or
InitPostgres), and instead rely on the GUC assign mechanism.

I've reworked the patch a bit more, see attached v4, with a couple of
noticeable changes:

In regards to the GUC:
- Use the GUC check mechanism to complain if RDTSC clock source is
requested, but its not available
- Use the GUC assign mechanism to set whether we're actually using the
RDTSC clock source
- "auto" now means that we use RDTSC clock source by default if we're
on Linux x86, and the system clocksource is "tsc"
- "rdtsc" now allows using RDTSC on any x86-based Unix-like systems (I
see no reason to restrict the BSDs from using RDTSC when setting it
explicitly)
- Allow changing the clock source GUC at any time, without requiring a
restart (it makes testing much easier, and I don't see a good reason
to require a restart, or even restrict this to superuser?)
- Have pg_test_timing emit whether a fast clock source will be used by
default (or whether one needs to change the GUC)

Additionally:
- If a client program wants to use the fast clock source (like
pg_test_timing does), it first needs to call
pg_initialize_fast_clock_source() -- this replaces the
INSTR_TIME_INITIALIZE calls.
- I've re-introduced a patch (0001) to set HAVE__CPUIDEX on modern
GCC/clang. That's necessary to make this work on VM Hypervisors (per
the patch's commit message)
- I've merged the GUC patch together with the patch that adds the
RDTSC implementation (0002), I don't think that makes sense to review
or commit separately.
- I've unified the RDTSC and RDTSCP handling, so we require both in
order to use TSC as a time source. Because we have the shared
pg_ticks_to_ns() function that gets used on an instr_time regardless
of fast vs "slow" timing, and the variables used in that function are
affected by the RDTSC availability, we must use TSC consistently - I
don't think we can mix RDTSC for fast and pg_clock_gettime() for slow,
as this patch series has done so far.

Open questions for me:
- I'm seeing a CI test failure for "Linux - Debian Trixie - Meson"
(times out), but its not clear if this is a fluke - I'll check if this
recurs on the commitfest patch
- We're doing a lot of work in pg_ticks_to_ns, even when we're not
using RDTSC - and I think that shows in a slightly slower
pg_test_timing measurement compared to master when fast clock source
is off. Can we somehow only do that when we use RDTSC?

Here is a fresh test run with this patch on an AWS c6i.xlarge, i.e.
Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz / "Ice Lake":

CREATE TABLE test (id int);
INSERT INTO test SELECT * FROM generate_series(0, 1000000);

postgres=# SET fast_clock_source = off;
SET
Time: 0.107 ms
postgres=# EXPLAIN ANALYZE SELECT COUNT(*) FROM test;
                                                                QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=10633.55..10633.56 rows=1 width=8) (actual
time=44.117..44.811 rows=1.00 loops=1)
   Buffers: shared hit=846 read=3579
   ->  Gather  (cost=10633.34..10633.55 rows=2 width=8) (actual
time=44.060..44.804 rows=3.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=846 read=3579
         ->  Partial Aggregate  (cost=9633.34..9633.35 rows=1 width=8)
(actual time=42.129..42.130 rows=1.00 loops=3)
               Buffers: shared hit=846 read=3579
               ->  Parallel Seq Scan on test  (cost=0.00..8591.67
rows=416667 width=0) (actual time=0.086..21.595 rows=333333.67
loops=3)
                     Buffers: shared hit=846 read=3579
 Planning Time: 0.043 ms
 Execution Time: 44.836 ms
(12 rows)

Time: 45.076 ms

postgres=# SET fast_clock_source = rdtsc;
SET
Time: 0.123 ms
postgres=# EXPLAIN ANALYZE SELECT COUNT(*) FROM test;
                                                                QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=10633.55..10633.56 rows=1 width=8) (actual
time=32.943..33.912 rows=1.00 loops=1)
   Buffers: shared hit=1128 read=3297
   ->  Gather  (cost=10633.34..10633.55 rows=2 width=8) (actual
time=32.868..33.906 rows=3.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=1128 read=3297
         ->  Partial Aggregate  (cost=9633.34..9633.35 rows=1 width=8)
(actual time=30.705..30.706 rows=1.00 loops=3)
               Buffers: shared hit=1128 read=3297
               ->  Parallel Seq Scan on test  (cost=0.00..8591.67
rows=416667 width=0) (actual time=0.080..15.223 rows=333333.67
loops=3)
                     Buffers: shared hit=1128 read=3297
 Planning Time: 0.042 ms
 Execution Time: 33.935 ms
(12 rows)

Time: 34.180 ms

postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM test;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=10633.55..10633.56 rows=1 width=8) (actual
rows=1.00 loops=1)
   Buffers: shared hit=1410 read=3015
   ->  Gather  (cost=10633.34..10633.55 rows=2 width=8) (actual
rows=3.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=1410 read=3015
         ->  Partial Aggregate  (cost=9633.34..9633.35 rows=1 width=8)
(actual rows=1.00 loops=3)
               Buffers: shared hit=1410 read=3015
               ->  Parallel Seq Scan on test  (cost=0.00..8591.67
rows=416667 width=0) (actual rows=333333.67 loops=3)
                     Buffers: shared hit=1410 read=3015
 Planning Time: 0.042 ms
 Execution Time: 27.876 ms
(12 rows)

Time: 28.135 ms

Thanks,
Lukas

--
Lukas Fittl

Attachment

pgsql-hackers by date:

Previous
From: Nikolay Samokhvalov
Date:
Subject: Re: IO wait events for COPY FROM/TO PROGRAM or file
Next
From: Tom Lane
Date:
Subject: Re: slow SELECT expr INTO var in plpgsql