Hi,
Currently LWLOCK_PADDED_SIZE is defined as:
/** All the LWLock structs are allocated as an array in shared memory.* (LWLockIds are indexes into the array.) We
forcethe array stride to* be a power of 2, which saves a few cycles in indexing, but more* importantly also ensures
thatindividual LWLocks don't cross cache line* boundaries. This reduces cache contention problems, especially on AMD*
Opterons. (Of course, we have to also ensure that the array start* address is suitably aligned.)** LWLock is between 16
and32 bytes on all known platforms, so these two* cases are sufficient.*/
#define LWLOCK_PADDED_SIZE (sizeof(LWLock) <= 16 ? 16 : 32)
typedef union LWLockPadded
{ LWLock lock; char pad[LWLOCK_PADDED_SIZE];
} LWLockPadded;
So, what we do is we guarantee that LWLocks are aligned to 16 or 32byte
boundaries. That means that on x86-64 (64byte cachelines, 24bytes
unpadded lwlock) two lwlocks share a cacheline. As struct LWLock
contains a spinlock and important lwlocks are often besides each other,
that strikes me as a bad idea.
Take for example the partitioned buffer mapping lock. This coding
essentially reduces the effect of partitioning by half in a readonly
workload where the only contention is the LWLock's spinlock itself.
Does anybody remember why this is done that way? The padding itself was
introduced in dc06734a .
In my benchmarks changing the padding to 64byte increases performance in
workloads with contended lwlocks considerably. 11% for a workload where
the buffer mapping lock is the major contention point, on a 2 socket
system.
Unfortunately increasing it to CACHE_LINE_SIZE/128 results in only a
2-3% increase.
Comments?
Greetings,
Andres Freund
-- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services