spinlocks on HP-UX - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | spinlocks on HP-UX |
Date | |
Msg-id | CA+TgmoZvATZV+eLh3U35jaNnwwzLL5ewUU_-t0X=T0Qwas+ZdA@mail.gmail.com Whole thread Raw |
Responses |
Re: spinlocks on HP-UX
Re: spinlocks on HP-UX |
List | pgsql-hackers |
I was able to obtain access to a 32-core HP-UX server. I repeated the pgbench -S testing that I have previously done on Linux, and found that the results were not too good. Here are the results at scale factor 100, on 9.2devel, with various numbers of clients. Five minute runs, shared_buffers=8GB. 1:tps = 5590.070816 (including connections establishing) 8:tps = 37660.233932 (including connections establishing) 16:tps = 67366.099286 (including connections establishing) 32:tps = 82781.624665 (including connections establishing) 48:tps = 18589.995074 (including connections establishing) 64:tps = 16424.661371 (including connections establishing) And just for comparison, here are the numbers at scale factor 1000: 1:tps = 4751.768608 (including connections establishing) 8:tps = 33621.474490 (including connections establishing) 16:tps = 58959.043171 (including connections establishing) 32:tps = 78801.265189 (including connections establishing) 48:tps = 21635.234969 (including connections establishing) 64:tps = 18611.863567 (including connections establishing) After mulling over the vmstat output for a bit, I began to suspect spinlock contention. I took a look at document called "Implementing Spinlocks on the Intel Itanium Architecture and PA-RISC", by Tor Ekqvist and David Graves and available via the HP web site, which states that when spinning on a spinlock on these machines, you should use a regular, unlocked test first and use the atomic test only when the unlocked test looks OK. I tried implementing this in two ways, and both produced results which are FAR superior to our current implementation. First, I did this: --- a/src/include/storage/s_lock.h +++ b/src/include/storage/s_lock.h @@ -726,7 +726,7 @@ tas(volatile slock_t *lock)typedef unsigned int slock_t; #include <ia64/sys/inline.h> -#define TAS(lock) _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE) +#define TAS(lock) (*(lock) ? 1 : _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE)) #endif /* HPUX on IA64, non gcc */ That resulted in these numbers. Scale factor 100: 1:tps = 5569.911714 (including connections establishing) 8:tps = 37365.364468 (including connections establishing) 16:tps = 63596.261875 (including connections establishing) 32:tps = 95948.157678 (including connections establishing) 48:tps = 90708.253920 (including connections establishing) 64:tps = 100109.065744 (including connections establishing) Scale factor 1000: 1:tps = 4878.332996 (including connections establishing) 8:tps = 33245.469907 (including connections establishing) 16:tps = 56708.424880 (including connections establishing) 48:tps = 69652.232635 (including connections establishing) 64:tps = 70593.208637 (including connections establishing) Then, I did this: --- a/src/backend/storage/lmgr/s_lock.c +++ b/src/backend/storage/lmgr/s_lock.c @@ -96,7 +96,7 @@ s_lock(volatile slock_t *lock, const char *file, int line) int delays = 0; int cur_delay = 0; - while (TAS(lock)) + while (*lock ? 1 : TAS(lock)) { /* CPU-specific delay each time through the loop */ SPIN_DELAY(); That resulted in these numbers, at scale factor 100: 1:tps = 5564.059494 (including connections establishing) 8:tps = 37487.090798 (including connections establishing) 16:tps = 66061.524760 (including connections establishing) 32:tps = 96535.523905 (including connections establishing) 48:tps = 92031.618360 (including connections establishing) 64:tps = 106813.631701 (including connections establishing) And at scale factor 1000: 1:tps = 4980.338246 (including connections establishing) 8:tps = 33576.680072 (including connections establishing) 16:tps = 55618.677975 (including connections establishing) 32:tps = 73589.442746 (including connections establishing) 48:tps = 70987.026228 (including connections establishing) Note sure why I am missing the 64-client results for that last set of tests, but no matter. Of course, we can't apply the second patch as it stands, because I tested it on x86 and it loses. But it seems pretty clear we need to do it at least for this architecture... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: