Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date
Msg-id 6wzsdvie5fvg427qti6icezycculhdr3vaykznbg27vtzzxa3p@hfk5m7cs5m4k
Whole thread
In response to Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?  (Lukas Fittl <lukas@fittl.com>)
Responses Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
List pgsql-hackers
Hi,

On 2026-04-07 00:04:51 -0700, Lukas Fittl wrote:
> Attached v22 with documentation updates. I've also marked the ARM
> patch "nocfbot" for now, so its clear we're doing that one later.

- changed comment in InitProcessGlobals to be consistent with other cases
- added a comment to the fallthrough to the generic logic in x86_tsc_frequency_khz
- started to wonder if we ought to report
     * The similar __get_cpuid_count function does not work as expected since
     * it contains a check for __get_cpuid_max, which has been observed to be
  as a bug.  That's obviously independent of committing this.

- a few typos (s/possibly/possible/, s/its/it's/, s/\. are/. Are/,
  INSTR_TIME_ADD_NANOSEC comment variable names) found with modern spell
  checking tools (i.e. an ai review)

- fixed include order in guc_tables.c

- slight rephrasing of pg_initialize_timing() comment about
  pg_set_timing_clock_source()

- Moved the indexterm to the relevant paragraphs. I don't think it really
  matters, but it seemed a tad clearer

- shortened some commit message titles ;)

- Added a comment about why always_inline is needed for pg_get_ticks, and as
  part of that swapped _fast() and plain versions around.

- I'm somewhat tired, because I was wondering whether TSC_CALIBRATION_SKIPS
  should be a power of 2, to make the check faster. But uh, IT DOESN'T MATTER
  :)

I've pushed 0001, 0002, 0003.

There's one minor documentation issue in 0004 that I wanted to look at before
pushing (and I need to switch to something else for a bit).  The rephrasing
gets rid of

-   [...] , with the worst case somewhere between 32768 and
-   65535 nanoseconds.  In the second block, we can see that typical loop
-   time is 16 nanoseconds, and the readings appear to have full nanosecond
-   precision.

I don't mind loosing the first sentence, but the second one might be useful to
somebody?


> > Now I wonder if we should rename 'tsc' to 'cpu'...
> 
> Yeah, I had proposed making it more generic in an earlier email, but I
> don't think there are great names available (and nobody jumped at the
> suggestion). I think "cpu" is a bit too unspecific, vs "tsc" is more
> clearly referencing the kind of instruction being used directly, and
> is unique enough that it makes people read on vs jumping to a
> conclusion. I was previously thinking of something like "hwtimer" or
> "hwclock", but I don't think those are great.
> 
> I think the worst case here is that we need to add a "Note this is
> named after the TSC instructions on x86-64, but utilizes the similarly
> functioning cntvct_el0 instruction on ARM." or something to the
> documentation if/when we expand support to ARM.

Yea, I think it's ok.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: EXPLAIN: showing ReadStream / prefetch stats
Next
From: Sami Imseih
Date:
Subject: Re: test_autovacuum/001_parallel_autovacuum is broken