Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? - Mailing list pgsql-hackers

From David Geier
Subject Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date
Msg-id d7f94549-9179-b63c-878a-002a7cb586a6@gmail.com
Whole thread Raw
In response to Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
List pgsql-hackers
I think it would be great to get this patch committed. Beyond the 
reasons already mentioned, the significant overhead also tends to skew 
the reported runtimes in ways that makes it difficult to compare them. 
For example, if two nodes are executed equally often but one needs twice 
the time to process the rows: in such a case EXPLAIN ANALYZE should 
report timings that are 2x apart. However, currently, the high overhead 
of clock_gettime() tends to skew the relative runtimes.

On 10/12/22 10:33, Michael Paquier wrote:
> No rebased version has been sent since this update, so this patch has
> been marked as RwF.

I've rebased the patch set on latest master and fixed a few compiler 
warnings. Beyond that some findings and thoughts:

You're only using RDTSC if the clock source is 'tsc'. Great idea to not 
bother caring about a lot of hairy TSC details. Looking at the kernel 
code this seems to imply that the TSC is frequency invariant. I don't 
think though that this implies that Linux is not running under a 
hypervisor; which is good because I assume PostgreSQL is used a lot in 
VMs. However, when running under a hypervisor (at least with VMWare) 
CPUID leaf 0x16 is not available. In my tests __get_cpuid() indicated 
success but the returned values were garbage. Instead of using leaf 
0x16, we should then use the hypervisor interface to obtain the TSC 
frequency. Checking if a hypervisor is active can be done via:

bool IsHypervisorActive()
{
     uint32 cpuinfo[4] = {0};
     int res = __get_cpuid(0x1, &cpuinfo[0], &cpuinfo[1], &cpuinfo[2], 
&cpuinfo[3]);
     return res > 0 && (cpuinfo[2] & (1 << 30));
}

Obtaining the TSC frequency via the hypervisor interface can be done 
with the following code. See https://lwn.net/Articles/301888/ for more 
details.

// Under hypervisors (tested with VMWare) leaf 0x16 is not available, 
even though __get_cpuid() succeeds.
// Hence, if running under a hypervisor, use the hypervisor interface to 
obtain TSC frequency.
uint32 cpuinfo[4] = {0};
if (IsHypervisorActive() && __get_cpuid(0x40000001, &cpuinfo[0], 
&cpuinfo[1], &cpuinfo[2], &cpuinfo[3]) > 0)
     cycles_to_sec = 1.0 / ((double)cpuinfo[0] * 1000 * 1000);

Given that we anyways switch between RDTSC and clock_gettime() with a 
global variable, what about exposing the clock source as GUC? That way 
the user can switch back to a working clock source in case we miss a 
detail around activating or reading the TSC.

I'm happy to update the patches accordingly.

--
David Geier
(ServiceNow)




pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: How to *really* quit psql?
Next
From: David Geier
Date:
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?