Re: spinlocks on HP-UX - Mailing list pgsql-hackers

From Tom Lane
Subject Re: spinlocks on HP-UX
Date
Msg-id 8292.1314641721@sss.pgh.pa.us
Whole thread Raw
In response to Re: spinlocks on HP-UX  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: spinlocks on HP-UX
Re: spinlocks on HP-UX
List pgsql-hackers
I wrote:
> I am also currently running tests on x86_64 and PPC using Red Hat test
> machines --- expect results later today.

OK, I ran some more tests.  These are not directly comparable to my
previous results with IA64, because (a) I used RHEL6.2 and gcc 4.4.6;
(b) I used half as many pgbench threads as backends, rather than one
thread per eight backends.  Testing showed that pgbench cannot saturate
more than two backends per thread in this test environment, as shown
for example by this series:

pgbench -c 8 -j 1 -S -T 300 bench    tps = 22091.461409 (including ...
pgbench -c 8 -j 2 -S -T 300 bench    tps = 42587.661755 (including ...
pgbench -c 8 -j 4 -S -T 300 bench    tps = 77515.057885 (including ...
pgbench -c 8 -j 8 -S -T 300 bench    tps = 75830.463821 (including ...

I find this entirely astonishing, BTW; the backend is surely doing far
more than twice as much work per query as pgbench.  We need to look into
why pgbench is apparently still such a dog.  However, that's not
tremendously relevant to the question of whether we need an unlocked
test in spinlocks.


These tests were run on a 32-CPU Opteron machine (Sun Fire X4600 M2,
8 quad-core sockets).  Test conditions the same as my IA64 set, except
for the OS and the -j switches:

Stock git head:

pgbench -c 1 -j 1 -S -T 300 bench    tps = 9515.435401 (including ...
pgbench -c 2 -j 1 -S -T 300 bench    tps = 20239.289880 (including ...
pgbench -c 8 -j 4 -S -T 300 bench    tps = 78628.371372 (including ...
pgbench -c 16 -j 8 -S -T 300 bench    tps = 143065.596555 (including ...
pgbench -c 32 -j 16 -S -T 300 bench    tps = 227349.424654 (including ...
pgbench -c 64 -j 32 -S -T 300 bench    tps = 269016.946095 (including ...
pgbench -c 96 -j 48 -S -T 300 bench    tps = 253884.095190 (including ...
pgbench -c 128 -j 64 -S -T 300 bench    tps = 269235.253012 (including ...

Non-locked test in TAS():

pgbench -c 1 -j 1 -S -T 300 bench    tps = 9316.195621 (including ...
pgbench -c 2 -j 1 -S -T 300 bench    tps = 19852.444846 (including ...
pgbench -c 8 -j 4 -S -T 300 bench    tps = 77701.546927 (including ...
pgbench -c 16 -j 8 -S -T 300 bench    tps = 138926.775553 (including ...
pgbench -c 32 -j 16 -S -T 300 bench    tps = 188485.669320 (including ...
pgbench -c 64 -j 32 -S -T 300 bench    tps = 253602.490286 (including ...
pgbench -c 96 -j 48 -S -T 300 bench    tps = 251181.310600 (including ...
pgbench -c 128 -j 64 -S -T 300 bench    tps = 260812.933702 (including ...

Non-locked test in TAS_SPIN() only:

pgbench -c 1 -j 1 -S -T 300 bench    tps = 9283.944739 (including ...
pgbench -c 2 -j 1 -S -T 300 bench    tps = 20213.208443 (including ...
pgbench -c 8 -j 4 -S -T 300 bench    tps = 78824.247744 (including ...
pgbench -c 16 -j 8 -S -T 300 bench    tps = 141027.072774 (including ...
pgbench -c 32 -j 16 -S -T 300 bench    tps = 201658.416366 (including ...
pgbench -c 64 -j 32 -S -T 300 bench    tps = 271035.843105 (including ...
pgbench -c 96 -j 48 -S -T 300 bench    tps = 261337.324585 (including ...
pgbench -c 128 -j 64 -S -T 300 bench    tps = 271272.921058 (including ...

So basically there is no benefit to the unlocked test on this hardware.
But it doesn't cost much either, which is odd because the last time we
did this type of testing, adding an unlocked test was a "huge loss" on
Opteron.  Apparently AMD improved their handling of the case, and/or
the other changes we've made change the usage pattern completely.

I am hoping to do a similar test on another machine with $bignum Xeon
processors, to see if Intel hardware reacts any differently.  But that
machine is in the Westford office which is currently without power,
so it will have to wait a few days.  (I can no longer get at either
of the machines cited in this mail, either, so if you want to see
more test cases it'll have to wait.)


These tests were run on a 32-processor PPC64 machine (IBM 8406-71Y,
POWER7 architecture; I think it might be 16 cores with hyperthreading,
not sure).  The machine has "only" 6GB of RAM so I set shared_buffers to
4GB, other test conditions the same:

Stock git head:

pgbench -c 1 -j 1 -S -T 300 bench    tps = 8746.076443 (including ...
pgbench -c 2 -j 1 -S -T 300 bench    tps = 12297.297308 (including ...
pgbench -c 8 -j 4 -S -T 300 bench    tps = 48697.392492 (including ...
pgbench -c 16 -j 8 -S -T 300 bench    tps = 94133.227472 (including ...
pgbench -c 32 -j 16 -S -T 300 bench    tps = 126822.857978 (including ...
pgbench -c 64 -j 32 -S -T 300 bench    tps = 129364.417801 (including ...
pgbench -c 96 -j 48 -S -T 300 bench    tps = 125728.697772 (including ...
pgbench -c 128 -j 64 -S -T 300 bench    tps = 131566.394880 (including ...

Non-locked test in TAS():

pgbench -c 1 -j 1 -S -T 300 bench    tps = 8810.484890 (including ...
pgbench -c 2 -j 1 -S -T 300 bench    tps = 12336.612804 (including ...
pgbench -c 8 -j 4 -S -T 300 bench    tps = 49023.435650 (including ...
pgbench -c 16 -j 8 -S -T 300 bench    tps = 96306.706556 (including ...
pgbench -c 32 -j 16 -S -T 300 bench    tps = 131731.475778 (including ...
pgbench -c 64 -j 32 -S -T 300 bench    tps = 133451.416612 (including ...
pgbench -c 96 -j 48 -S -T 300 bench    tps = 110076.269474 (including ...
pgbench -c 128 -j 64 -S -T 300 bench    tps = 111339.797242 (including ...

Non-locked test in TAS_SPIN() only:

pgbench -c 1 -j 1 -S -T 300 bench    tps = 8726.269726 (including ...
pgbench -c 2 -j 1 -S -T 300 bench    tps = 12228.415466 (including ...
pgbench -c 8 -j 4 -S -T 300 bench    tps = 48227.623829 (including ...
pgbench -c 16 -j 8 -S -T 300 bench    tps = 93302.510254 (including ...
pgbench -c 32 -j 16 -S -T 300 bench    tps = 130661.097475 (including ...
pgbench -c 64 -j 32 -S -T 300 bench    tps = 133009.181697 (including ...
pgbench -c 96 -j 48 -S -T 300 bench    tps = 128710.757986 (including ...
pgbench -c 128 -j 64 -S -T 300 bench    tps = 133063.460934 (including ...

So basically no value to an unlocked test on this platform either.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: spinlocks on HP-UX
Next
From: Tom Lane
Date:
Subject: Re: spinlocks on HP-UX