Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? - Mailing list pgsql-hackers

From Jakub Wartak
Subject Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date
Msg-id CAKZiRmzF50+drGgm6F-K1dQnuT=Khob0Q_dfZdv0-1iq4TVa4Q@mail.gmail.com
Whole thread Raw
In response to Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?  (Lukas Fittl <lukas@fittl.com>)
Responses Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
List pgsql-hackers
Hi Lukas,

On Sun, Feb 1, 2026 at 4:16 AM Lukas Fittl <lukas@fittl.com> wrote:
>
> On Sat, Jan 31, 2026 at 12:11 PM Lukas Fittl <lukas@fittl.com> wrote:
> > I've reworked the patch a bit more, see attached v4
>
> And of course, I took the wrong branch when running "git format-patch"
> - apologies.
>
> See attached v5.

> +#define CPUID_HYPERVISOR_VMWARE(words) (words[1] == 0x61774d56 && words[2] == 0x4d566572 && words[3] == 0x65726177)
/*VMwareVMware */ 
> +#define CPUID_HYPERVISOR_KVM(words) (words[1] == 0x4b4d564b && words[2] == 0x564b4d56 && words[3] == 0x0000004d)
/*KVMKVMKVM */ 
> +
> +static bool
> +get_tsc_frequency_khz()
[..]
> +    /*
> +     * Check if we have a KVM or VMware Hypervisor passing down TSC frequency
> +     * to us in a guest VM
> +     *
> +     * Note that accessing the 0x40000000 leaf for Hypervisor info requires
> +     * use of __cpuidex to set ECX to 0. The similar __get_cpuid_count
> +     * function does not work as expected since it contains a check for
> +     * __get_cpuid_max, which has been observed to be lower than the special
> +     * Hypervisor leaf.
> +     */
> +#if defined(HAVE__CPUIDEX)
> +    __cpuidex((int32 *) r, 0x40000000, 0);
> +    if (r[0] >= 0x40000010 && (CPUID_HYPERVISOR_VMWARE(r) || CPUID_HYPERVISOR_KVM(r)))
> +    {
> +        __cpuidex((int32 *) r, 0x40000010, 0);
> +        if (r[0] > 0)
> +        {
> +            tsc_freq = r[0];
> +            return true;
> +        }
> +    }
> +#endif
> +
> +    return false;
> +}

When trying to understand this code I was thinking how it could be
made smaller or less dependent on such low-level intrinsics, the only
thing that came to my mind was launching systemd-detect-virt(1) via
fork+execve, as after all we do have USE_SYSTEMD (for sd_notify(2) already
consumed in backend/postmaster/postmaster.c) anyway.

Sadly this path for checking VM-types seems like opening can of worms
- they evolved lots of code to cover various other products,
see e.g. in detect_vm() and that thing is not exported.

Another way would be probably inquiring their D-Bus API, something like
below command seems to work:
   busctl get-property org.freedesktop.systemd1
/org/freedesktop/systemd1 org.freedesktop.systemd1.Manager
Virtualization

(that seems to be sd_bus_get_property_string(3)).

It's not that I'm recommending usage of any of those (which is linked
to us most of the time?) or fan of D-Bus (I'm not). I've just thought
it might be less code to use it for autodetection of VM type, but
apparently not (?) See their detect_vm_cpuid() with that vm_table[]
and memcmp() seems to be a more elegant way of writing this.

BTW, -1 to fast_clock_source, +1 to clock_source or maybe
explain_clock_source(?)

Also it would be cool if the patch would provide some way of reporting back
what clock_source was really used in case of FAST_CLOCK_SOURCE_AUTO.
Something like huge_pages_status or some elog(DEBUG).

-J.

[1] -
https://github.com/systemd/systemd/blob/e831a44b07ebf48992967e366cfc1bcee2683f3d/src/detect-virt/detect-virt.c#L186
[2] - https://github.com/systemd/systemd/blob/e831a44b07ebf48992967e366cfc1bcee2683f3d/src/basic/virt.c#L450



pgsql-hackers by date:

Previous
From: Mihail Nikalayeu
Date:
Subject: Re: Adding REPACK [concurrently]
Next
From: David Geier
Date:
Subject: Re: Hash-based MCV matching for large IN-lists