Re: spinlocks on powerpc - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: spinlocks on powerpc |
Date | |
Msg-id | 29367.1325480634@sss.pgh.pa.us Whole thread Raw |
In response to | Re: spinlocks on powerpc (Manabu Ori <manabu.ori@gmail.com>) |
Responses |
Re: spinlocks on powerpc
Re: spinlocks on powerpc Re: spinlocks on powerpc |
List | pgsql-hackers |
Manabu Ori <manabu.ori@gmail.com> writes: > I recreated the patch as you advised. Hmm, guess I wasn't clear --- we still need a configure test, since even if we are on PPC64 there's no guarantee that the assembler will accept the hint bit. I revised the patch to include a configure test and committed it. However, I omitted the part that added an unlocked test in TAS_SPIN, because (1) that's logically a separate change, and (2) in my testing the unlocked test produces a small but undeniable performance loss (see numbers below). We need to investigate a bit more to understand why I'm getting results different from yours. If the bottom line is that the unlocked test loses for smaller numbers of processors and only helps with lots of them, I have to question whether it's a good idea to apply it. >> Shouldn't we just make slock_t be "int" for both PPC and PPC64? > I'd like it to be untouched for this TAS_SPIN for powerpc > discussion, since it seems it remainds like this for several > years and maybe it needs some more careful consideration I ran a test and could not see any consistent performance difference between 4-byte and 8-byte slock_t, so I've committed that change too. Obviously that can be revisited if anyone comes up with evidence in the other direction. While I was looking at this, I noticed that PPC ISA 2.03 and later recommend use of "lwsync" rather than "isync" and "sync" in lock acquisition and release, and sure enough I can measure improvement from making that change too. So again the problem is to know whether it's safe to use that instruction. Googling shows that there's at least one current 32-bit PPC chip that gives SIGILL (Freescale's E500 ... thanks for nothing, Freescale ...); but at least some projects are using 64-bitness as a proxy test for whether it's safe to use lwsync. So for the moment I've also committed a patch that switches to using lwsync on PPC64. We can perhaps improve on that too, but it's got basically the same issues as the hint bit with respect to how to know at compile time whether the instruction is safe at run time. I would be interested to see results from your 750 Express machine as to the performance impact of each of these successive patches, and then perhaps the TAS_SPIN change on top of that. While working on this, I repeated the tests I did in http://archives.postgresql.org/message-id/8292.1314641721@sss.pgh.pa.us With current git head, I get: pgbench -c 1 -j 1 -S -T 300 bench tps = 8703.264346 (including connections establishing) pgbench -c 2 -j 1 -S -T 300 bench tps = 12207.827348 (including connections establishing) pgbench -c 8 -j 4 -S -T 300 bench tps = 48593.999965 (including connections establishing) pgbench -c 16 -j 8 -S -T 300 bench tps = 91155.555180 (including connections establishing) pgbench -c 32 -j 16 -S -T 300 bench tps = 124648.093971 (including connections establishing) pgbench -c 64 -j 32 -S -T 300 bench tps = 129488.449355 (including connections establishing) pgbench -c 96 -j 48 -S -T 300 bench tps = 124958.553086 (including connections establishing) pgbench -c 128 -j 64 -S -T 300 bench tps = 134195.370726 (including connections establishing) (It's depressing that these numbers have hardly moved since August --- at least on this test, the work that Robert's done has not made any difference.) These numbers are repeatable in the first couple of digits, but there's some noise in the third digit. With your patch (hint bit and TAS_SPIN change) I get: pgbench -c 1 -j 1 -S -T 300 bench tps = 8751.930270 (including connections establishing) pgbench -c 2 -j 1 -S -T 300 bench tps = 12211.160964 (including connections establishing) pgbench -c 8 -j 4 -S -T 300 bench tps = 48608.877131 (including connections establishing) pgbench -c 16 -j 8 -S -T 300 bench tps = 90827.234014 (including connections establishing) pgbench -c 32 -j 16 -S -T 300 bench tps = 123267.062954 (including connections establishing) pgbench -c 64 -j 32 -S -T 300 bench tps = 128951.585059 (including connections establishing) pgbench -c 96 -j 48 -S -T 300 bench tps = 126551.870909 (including connections establishing) pgbench -c 128 -j 64 -S -T 300 bench tps = 133311.793735 (including connections establishing) With the TAS_SPIN change only, no hint bit: pgbench -c 1 -j 1 -S -T 300 bench tps = 8764.703599 (including connections establishing) pgbench -c 2 -j 1 -S -T 300 bench tps = 12163.321040 (including connections establishing) pgbench -c 8 -j 4 -S -T 300 bench tps = 48580.673497 (including connections establishing) pgbench -c 16 -j 8 -S -T 300 bench tps = 90672.227488 (including connections establishing) pgbench -c 32 -j 16 -S -T 300 bench tps = 121604.634146 (including connections establishing) pgbench -c 64 -j 32 -S -T 300 bench tps = 129088.387379 (including connections establishing) pgbench -c 96 -j 48 -S -T 300 bench tps = 126291.854733 (including connections establishing) pgbench -c 128 -j 64 -S -T 300 bench tps = 133386.394587 (including connections establishing) With the hint bit only: pgbench -c 1 -j 1 -S -T 300 bench tps = 8737.175661 (including connections establishing) pgbench -c 2 -j 1 -S -T 300 bench tps = 12208.715035 (including connections establishing) pgbench -c 8 -j 4 -S -T 300 bench tps = 48655.181597 (including connections establishing) pgbench -c 16 -j 8 -S -T 300 bench tps = 91365.152223 (including connections establishing) pgbench -c 32 -j 16 -S -T 300 bench tps = 124867.442389 (including connections establishing) pgbench -c 64 -j 32 -S -T 300 bench tps = 129795.644368 (including connections establishing) pgbench -c 96 -j 48 -S -T 300 bench tps = 126515.035998 (including connections establishing) pgbench -c 128 -j 64 -S -T 300 bench tps = 134551.825202 (including connections establishing) So it looks to me like the TAS_SPIN change is a loser either by itself or with the hint bit. I then tried substituting LWSYNC for SYNC in S_UNLOCK(lock): pgbench -c 1 -j 1 -S -T 300 bench tps = 8807.059517 (including connections establishing) pgbench -c 2 -j 1 -S -T 300 bench tps = 12204.028897 (including connections establishing) pgbench -c 8 -j 4 -S -T 300 bench tps = 49051.003729 (including connections establishing) pgbench -c 16 -j 8 -S -T 300 bench tps = 91904.111604 (including connections establishing) pgbench -c 32 -j 16 -S -T 300 bench tps = 125049.367820 (including connections establishing) pgbench -c 64 -j 32 -S -T 300 bench tps = 130259.490608 (including connections establishing) pgbench -c 96 -j 48 -S -T 300 bench tps = 125581.037607 (including connections establishing) pgbench -c 128 -j 64 -S -T 300 bench tps = 135143.761032 (including connections establishing) and finally LWSYNC for both SYNC and ISYNC: pgbench -c 1 -j 1 -S -T 300 bench tps = 8865.769698 (including connections establishing) pgbench -c 2 -j 1 -S -T 300 bench tps = 12278.078258 (including connections establishing) pgbench -c 8 -j 4 -S -T 300 bench tps = 49172.415634 (including connections establishing) pgbench -c 16 -j 8 -S -T 300 bench tps = 92229.289211 (including connections establishing) pgbench -c 32 -j 16 -S -T 300 bench tps = 125466.790383 (including connections establishing) pgbench -c 64 -j 32 -S -T 300 bench tps = 129422.631959 (including connections establishing) pgbench -c 96 -j 48 -S -T 300 bench tps = 124240.447533 (including connections establishing) pgbench -c 128 -j 64 -S -T 300 bench tps = 133892.054917 (including connections establishing) That last is clearly a winner for reasonable numbers of processes, so I committed it that way, but I'm a little worried by the fact that it looks like it might be a marginal loss when the system is overloaded. I would like to see results from your machine. regards, tom lane
pgsql-hackers by date: