Thread: Reasoning behind LWLOCK_PADDED_SIZE/increase it to a full cacheline
Hi, Currently LWLOCK_PADDED_SIZE is defined as: /** All the LWLock structs are allocated as an array in shared memory.* (LWLockIds are indexes into the array.) We forcethe array stride to* be a power of 2, which saves a few cycles in indexing, but more* importantly also ensures thatindividual LWLocks don't cross cache line* boundaries. This reduces cache contention problems, especially on AMD* Opterons. (Of course, we have to also ensure that the array start* address is suitably aligned.)** LWLock is between 16 and32 bytes on all known platforms, so these two* cases are sufficient.*/ #define LWLOCK_PADDED_SIZE (sizeof(LWLock) <= 16 ? 16 : 32) typedef union LWLockPadded { LWLock lock; char pad[LWLOCK_PADDED_SIZE]; } LWLockPadded; So, what we do is we guarantee that LWLocks are aligned to 16 or 32byte boundaries. That means that on x86-64 (64byte cachelines, 24bytes unpadded lwlock) two lwlocks share a cacheline. As struct LWLock contains a spinlock and important lwlocks are often besides each other, that strikes me as a bad idea. Take for example the partitioned buffer mapping lock. This coding essentially reduces the effect of partitioning by half in a readonly workload where the only contention is the LWLock's spinlock itself. Does anybody remember why this is done that way? The padding itself was introduced in dc06734a . In my benchmarks changing the padding to 64byte increases performance in workloads with contended lwlocks considerably. 11% for a workload where the buffer mapping lock is the major contention point, on a 2 socket system. Unfortunately increasing it to CACHE_LINE_SIZE/128 results in only a 2-3% increase. Comments? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > So, what we do is we guarantee that LWLocks are aligned to 16 or 32byte > boundaries. That means that on x86-64 (64byte cachelines, 24bytes > unpadded lwlock) two lwlocks share a cacheline. Yup. > In my benchmarks changing the padding to 64byte increases performance in > workloads with contended lwlocks considerably. At a huge cost in RAM. Remember we make two LWLocks per shared buffer. I think that rather than using a blunt instrument like that, we ought to see if we can identify pairs of hot LWLocks and make sure they're not adjacent. regards, tom lane
On 2013-09-24 12:39:39 +0200, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > So, what we do is we guarantee that LWLocks are aligned to 16 or 32byte > > boundaries. That means that on x86-64 (64byte cachelines, 24bytes > > unpadded lwlock) two lwlocks share a cacheline. > > In my benchmarks changing the padding to 64byte increases performance in > > workloads with contended lwlocks considerably. > > At a huge cost in RAM. Remember we make two LWLocks per shared buffer. > I think that rather than using a blunt instrument like that, we ought to > see if we can identify pairs of hot LWLocks and make sure they're not > adjacent. That's a good point. What about making all but the shared buffer lwlocks 64bytes? It seems hard to analyze the interactions between all the locks and keep it maintained. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 24, 2013 at 6:48 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-09-24 12:39:39 +0200, Tom Lane wrote: >> Andres Freund <andres@2ndquadrant.com> writes: >> > So, what we do is we guarantee that LWLocks are aligned to 16 or 32byte >> > boundaries. That means that on x86-64 (64byte cachelines, 24bytes >> > unpadded lwlock) two lwlocks share a cacheline. > >> > In my benchmarks changing the padding to 64byte increases performance in >> > workloads with contended lwlocks considerably. >> >> At a huge cost in RAM. Remember we make two LWLocks per shared buffer. > >> I think that rather than using a blunt instrument like that, we ought to >> see if we can identify pairs of hot LWLocks and make sure they're not >> adjacent. > > That's a good point. What about making all but the shared buffer lwlocks > 64bytes? It seems hard to analyze the interactions between all the locks > and keep it maintained. I think somebody had a patch a few years ago that made it so that the LWLocks didn't have to be in a single array, but could instead be anywhere in shared memory. Internally, lwlock.c held onto LWLock pointers instead of LWLockIds. That idea seems like it might be worth revisiting, in terms of giving us more options as to how LWLocks can be laid out in shared memory. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company