Re: Timing overhead and Linux clock sources - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Timing overhead and Linux clock sources
Date
Msg-id 4EE2B534.40502@2ndQuadrant.com
Whole thread Raw
In response to Re: Timing overhead and Linux clock sources  (Ants Aasma <ants.aasma@eesti.ee>)
Responses Re: Timing overhead and Linux clock sources  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On 12/09/2011 06:48 PM, Ants Aasma wrote:
> The attached test program (test_gettimeofday_monotonic) shows that one
> test loop iteration takes 34ns with tsc and 1270ns with hpet.
>    

This test program is great, I've wanted this exact sort of visibility 
into this problem for years.  I've toyed with writing something like 
this and then seeing what results it returns on all of the build farm 
animals.  For now I can just run it on all the hardware I have access 
to, which is not a small list.

I think we should bundle this up similarly to test_fsync, document some 
best practices on time sources, and then point the vague warning about 
EXPLAIN overhead toward that.  Then new sources of timing overhead can 
point there too.   Much like low-level fsync timing, there's nothing 
PostgreSQL can do about it, the best we can do is provide a program to 
help users classify a system as likely or unlikely to run to have 
high-overhead timing.  I can make the needed docs changes, this is 
resolving a long standing issue impacting code I wanted to add.

Rough guideline I'm seeing shape up is that <50ns is unlikely to cause 
clock timing to be a significant problem, >500ns certainly is, and 
values in the middle should concern people but not necessarily 
invalidate the timing data collected.

I just confirmed that switching the clock source by echoing a new value 
into /sys/devices/system/clocksource/clocksource0/current_clocksource 
works (on the 2.6.32 kernel at least).  What we want to see on a good 
server is a result that looks like this, from my Intel i7-860 system:

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
$ ./test_gettimeofday_monotonic 5
Per loop: 39.30 ns     usec:      count   percent        4:          6  0.00000%        2:        104  0.00008%
1:   4999760  3.92983%        0:  122226109 96.07009%
 

Here's how badly that degrades if I use one of the alternate time sources:

# echo acpi_pm > 
/sys/devices/system/clocksource/clocksource0/current_clocksource
$ ./test_gettimeofday_monotonic 5
Per loop: 727.65 ns     usec:      count   percent       16:          1  0.00001%        8:          0  0.00000%
4:      1233  0.01794%        2:        699  0.01017%        1:    4992622 72.65764%        0:    1876879 27.31423%
 

echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
$ ./test_gettimeofday_monotonic 5
Per loop: 576.96 ns     usec:      count   percent        8:          2  0.00002%        4:       1273  0.01469%
2:       767  0.00885%        1:    4993028 57.61598%        0:    3670977 42.36046%
 

Switching to the Intel T7200 CPU in my T60 laptop only provides the poor 
quality time sources, not TSC, and results show timing is really slow there:

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
$ ./test_gettimeofday_monotonic 5
Per loop: 1019.60 ns     usec:      count   percent      256:          2  0.00004%      128:          3  0.00006%
64:        90  0.00184%       32:         23  0.00047%       16:         92  0.00188%        8:       1246  0.02541%
   4:         34  0.00069%        2:     136154  2.77645%        1:    4700772 95.85818%        0:      65466
1.33498%

# echo acpi_pm > 
/sys/devices/system/clocksource/clocksource0/current_clocksource
$ ./test_gettimeofday_monotonic 5
Per loop: 1864.66 ns     usec:      count   percent      256:          2  0.00007%      128:          0  0.00000%
64:         3  0.00011%       32:          6  0.00022%       16:         90  0.00336%        8:       1741  0.06493%
   4:       2062  0.07690%        2:    2260601 84.30489%        1:     416954 15.54952%        0:          0
0.00000%

This seems to be measuring exactly the problem I only had a hand-wave 
"it's bad on this hardware" explanation of before.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



pgsql-hackers by date:

Previous
From: Ants Aasma
Date:
Subject: Re: Timing overhead and Linux clock sources
Next
From: Tom Lane
Date:
Subject: Why do regression tests now leave "regress_test_role_super" around?