Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? - Mailing list pgsql-hackers
| From | Andres Freund |
|---|---|
| Subject | Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? |
| Date | |
| Msg-id | znzsoci7b3crunqxf66xdylnllijbqij4vafos7yskmrbtqxhs@6hegy3qeu7e7 Whole thread |
| In response to | Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? (Lukas Fittl <lukas@fittl.com>) |
| Responses |
Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
|
| List | pgsql-hackers |
Hi,
On 2026-04-06 20:41:46 -0700, Lukas Fittl wrote:
> On Mon, Apr 6, 2026 at 5:40 PM Andres Freund <andres@anarazel.de> wrote:
> > I wonder if the cpuid tests should be a bit further abstracted into
> > pg_cpu_x86.c.
> >
> > E.g. instead of tsc_detect_frequency() checking for PG_RDTSCP,
> > PG_TSC_INVARIANT, PG_TSC_ADJUST we could have
> >
> > PG_TSC_AVAILABLE /* RDTSCP & INVARIANT */
> > PG_TSC_KNOWN_RELIABLE /* PG_TSC_AVAILABLE && PG_TSC_ADJUST */
> > PG_TSC_FREQUENCY_KNOWN /* x86_tsc_frequency_khz works */
> >
> > and always run all of that during set_x86_features().
>
> I think that could work, but I kept the flags in features closer to
> being direct mappings to CPUID bits since that seemed to be intent of
> how John designed the facility originally.
>
> John, do you have thoughts on this? (I've not changed it for now)
I'm ok either way.
> FWIW, I don't think having PG_TSC_KNOWN_RELIABLE makes sense in any
> case, because that would tie together x86_tsc_frequency_khz and
> set_x86_features, i.e. you'd either have the frequency return function
> modify X86Features later, or always run x86_tsc_frequency_khz when
> setting features (and that'd then require you to put the frequency
> value somewhere, etc.)
I was thinking the latter.
> I've gone ahead and rewritten that whole paragraph for clarity, and
> also split it into two. Feedback welcome:
>
> <para>
> If enabled, the TSC clock source will use specialized CPU instructions
> when measuring time intervals. This lowers timing overhead compared to
> reading the OS system clock, and reduces the measurement error on top
> of the actual runtime, for example with EXPLAIN ANALYZE.
> </para>
> <para>
> On x86-64 CPUs the TSC clock source utilizes the Time-Stamp Counter (TSC)
It's a bit weird that the third use of TSC in these paragraphs introduces
Time-Stamp Counter. I can see how you get there, but ...
Now I wonder if we should rename 'tsc' to 'cpu'...
> of the CPU. The RDTSC instruction is used to read the TSC for EXPLAIN ANALYZE.
> For timings that require higher precision the RDTSCP instruction is used,
> which avoids inaccuracies due to CPU instruction re-ordering. Use of
> RDTSC/RDTSCP is not supported on older x86-64 CPUs or hypervisors that don't
> pass the TSC frequency to guest VMs, and is not advised on systems that
s/guest VMs/virtual machines/?
> utilize an emulated TSC. The TSC clock source is currently not supported on
> other architectures.
The not support bit about hypervisors isn't quite right though? We do even use
it automatically if TSC_ADJUST is set (and the calibration loop succeeds).
> </para>
> <para>
> To help decide which clock source to use you can run the
> <application>pg_test_timing</application>
> utility to check TSC availability, and perform timing measurements.
> </para>
How about a link to to the pg_test_timing page? Hm, I guess that should also
be updated with new output.
I'd also sprinkle a few <acronym> and <command>s around.
Wonder if it's worth adding something like
<indexterm><primary><acronym>RDTSC</acronym></primary></indexterm>
<indexterm>
<primary>Time-Stamp Counter</primary>
<see><acronym>TSC</acronym></see>
</indexterm>
<indexterm><primary><acronym>TSC</acronym></primary></indexterm>
otherwise somebody seeing one of these in logs, pg_test_timing output or
whatever has even less of a chance to figure it out within our docs. They're
not hard to search for terms exactly, so ...
> I've also marked pg_get_ticks(_fast) as pg_attribute_always_inline,
> per an off-list comment from Andres that he observed GCC not fully
> inlining that function in pg_test_timing, presumably due to the
> likely(..) in it.
It's not the likely, I reproduced it even without that. I mouthed off about
compilers on mastodon and was kindly asked to just open a bug report :)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124795
I think discussing which indexterms should be added signals that this is
pretty close.
Greetings,
Andres Freund
pgsql-hackers by date: