Re: spinlocks on HP-UX - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | Re: spinlocks on HP-UX |
Date | |
Msg-id | 20110906.173311.1317184787263658707.t-ishii@sraoss.co.jp Whole thread Raw |
In response to | spinlocks on HP-UX (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: spinlocks on HP-UX
|
List | pgsql-hackers |
Hi, I am interested in this thread because I may be able to borrow a big IBM machine and might be able to do some tests on it if it somewhat contributes enhancing PostgreSQL. Is there anything I can do for this? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp > I was able to obtain access to a 32-core HP-UX server. I repeated the > pgbench -S testing that I have previously done on Linux, and found > that the results were not too good. Here are the results at scale > factor 100, on 9.2devel, with various numbers of clients. Five minute > runs, shared_buffers=8GB. > > 1:tps = 5590.070816 (including connections establishing) > 8:tps = 37660.233932 (including connections establishing) > 16:tps = 67366.099286 (including connections establishing) > 32:tps = 82781.624665 (including connections establishing) > 48:tps = 18589.995074 (including connections establishing) > 64:tps = 16424.661371 (including connections establishing) > > And just for comparison, here are the numbers at scale factor 1000: > > 1:tps = 4751.768608 (including connections establishing) > 8:tps = 33621.474490 (including connections establishing) > 16:tps = 58959.043171 (including connections establishing) > 32:tps = 78801.265189 (including connections establishing) > 48:tps = 21635.234969 (including connections establishing) > 64:tps = 18611.863567 (including connections establishing) > > After mulling over the vmstat output for a bit, I began to suspect > spinlock contention. I took a look at document called "Implementing > Spinlocks on the Intel Itanium Architecture and PA-RISC", by Tor > Ekqvist and David Graves and available via the HP web site, which > states that when spinning on a spinlock on these machines, you should > use a regular, unlocked test first and use the atomic test only when > the unlocked test looks OK. I tried implementing this in two ways, > and both produced results which are FAR superior to our current > implementation. First, I did this: > > --- a/src/include/storage/s_lock.h > +++ b/src/include/storage/s_lock.h > @@ -726,7 +726,7 @@ tas(volatile slock_t *lock) > typedef unsigned int slock_t; > > #include <ia64/sys/inline.h> > -#define TAS(lock) _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE) > +#define TAS(lock) (*(lock) ? 1 : _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE)) > > #endif /* HPUX on IA64, non gcc */ > > That resulted in these numbers. Scale factor 100: > > 1:tps = 5569.911714 (including connections establishing) > 8:tps = 37365.364468 (including connections establishing) > 16:tps = 63596.261875 (including connections establishing) > 32:tps = 95948.157678 (including connections establishing) > 48:tps = 90708.253920 (including connections establishing) > 64:tps = 100109.065744 (including connections establishing) > > Scale factor 1000: > > 1:tps = 4878.332996 (including connections establishing) > 8:tps = 33245.469907 (including connections establishing) > 16:tps = 56708.424880 (including connections establishing) > 48:tps = 69652.232635 (including connections establishing) > 64:tps = 70593.208637 (including connections establishing) > > Then, I did this: > > --- a/src/backend/storage/lmgr/s_lock.c > +++ b/src/backend/storage/lmgr/s_lock.c > @@ -96,7 +96,7 @@ s_lock(volatile slock_t *lock, const char *file, int line) > int delays = 0; > int cur_delay = 0; > > - while (TAS(lock)) > + while (*lock ? 1 : TAS(lock)) > { > /* CPU-specific delay each time through the loop */ > SPIN_DELAY(); > > That resulted in these numbers, at scale factor 100: > > 1:tps = 5564.059494 (including connections establishing) > 8:tps = 37487.090798 (including connections establishing) > 16:tps = 66061.524760 (including connections establishing) > 32:tps = 96535.523905 (including connections establishing) > 48:tps = 92031.618360 (including connections establishing) > 64:tps = 106813.631701 (including connections establishing) > > And at scale factor 1000: > > 1:tps = 4980.338246 (including connections establishing) > 8:tps = 33576.680072 (including connections establishing) > 16:tps = 55618.677975 (including connections establishing) > 32:tps = 73589.442746 (including connections establishing) > 48:tps = 70987.026228 (including connections establishing) > > Note sure why I am missing the 64-client results for that last set of > tests, but no matter. > > Of course, we can't apply the second patch as it stands, because I > tested it on x86 and it loses. But it seems pretty clear we need to > do it at least for this architecture... > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: