On Thu, 15 Sep 2005, Tom Lane wrote:
> Gavin Sherry <swm@linuxworld.com.au> writes:
> > What about padding the LWLock to 64 bytes on these architectures. Both P4
> > and Opteron have 64 byte cache lines, IIRC. This would ensure that a
> > cacheline doesn't hold two LWLocks.
>
> I tried that first, actually, but it was a net loss. I guess enlarging
> the array that much wastes too much cache space.
Interesting. On Xeon (2 phys, 4 log), with LWLock padded to 64 bytes and
the cmpb/jump removed I get:
[swm@backup pgsqlpad]$ for i in 1 2 4; do time ./nrun.sh $i; done
real 0m54.362s
user 0m0.003s
sys 0m0.009s
real 1m9.788s
user 0m0.011s
sys 0m0.013s
real 2m8.870s
user 0m0.016s
sys 0m0.028s
[swm@backup pgsqlpad]$ for i in 1 2 4; do time ./nrun.sh $i; done
real 0m55.544s
user 0m0.006s
sys 0m0.007s
real 1m9.313s
user 0m0.007s
sys 0m0.018s
real 2m1.769s
user 0m0.017s
sys 0m0.027s
This compares to the following, which is unpadded but has cmpb/jump
removed but is otherwise vanilla:
1: 55: 2: 111: 4: 207
The decrease is small, but it's there.
Gavin