Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? - Mailing list pgsql-hackers

From David Geier
Subject Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date
Msg-id 3eaeaa3a-b78e-ef0d-7319-5d713bbc09a0@gmail.com
Whole thread Raw
In response to Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?  (David Geier <geidav.pg@gmail.com>)
Responses Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?  (Lukas Fittl <lukas@fittl.com>)
Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

I re-based again on master and applied the following changes:

I removed the fallback for obtaining the TSC frequency from /proc/cpu as 
suggested by Andres. Worst-case we fall back to clock_gettime().

I added code to obtain the TSC frequency via CPUID when under a 
hypervisor. I had to use __cpuid() directly instead of __get_cpuid(), 
because __get_cpuid() returns an error if the leaf is > 0x80000000 
(probably the implementation pre-dates the hypervisor timing leafs). 
Unfortunately, while testing my implementation under VMWare, I found 
that RDTSC runs awfully slow there (like 30x slower). [1] indicates that 
we cannot generally rely on RDTSC being actually fast on VMs. However, 
the same applies to clock_gettime(). It runs as slow as RDTSC on my 
VMWare setup. Hence, using RDTSC is not at disadvantage. I'm not 
entirely sure if there aren't cases where e.g. clock_gettime() is 
actually faster than RDTSC and it would be advantageous to use 
clock_gettime(). We could add a GUC so that the user can decide which 
clock source to use. Any thoughts?

I also somewhat improved the accuracy of the cycles to milli- and 
microseconds conversion functions by having two more multipliers with 
higher precision. For microseconds we could also keep the computation 
integer-only. I'm wondering what to best do for seconds and 
milliseconds. I'm currently leaning towards just keeping it as is, 
because the durations measured and converted are usually long enough 
that precision shouldn't be a problem.

In vacuum_lazy.c we do if ((INSTR_TIME_GET_MICROSEC(elapsed) / 1000). I 
changed that to use INSTR_TIME_GET_MILLISECS() instead. Additionally, I 
initialized a few variables of type instr_time which otherwise resulted 
in warnings due to use of potentially uninitialized variables.

I also couldn't reproduce the reported timing discrepancy. For me the 
runtime reported by \timing is just slightly higher than the time 
reported by EXPLAIN ANALYZE, which is expected.

Beyond that:

What about renaming INSTR_TIME_GET_DOUBLE() to INSTR_TIME_GET_SECS() so 
that it's consistent with the _MILLISEC() and _MICROSEC() variants?

The INSTR_TIME_GET_MICROSEC() returns a uint64 while the other variants 
return double. This seems error prone. What about renaming the function 
or also have the function return a double and cast where necessary at 
the call site?

If no one objects I would also re-register this patch in the commit fest.

[1] 
https://vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf 
(page 11 "Virtual TSC")

-- 
David Geier
(ServiceNow)

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Data loss on logical replication, 12.12 to 14.5, ALTER SUBSCRIPTION
Next
From: Andrew Dunstan
Date:
Subject: Re: Add a test to ldapbindpasswd