Thread: Reasoning behind LWLOCK_PADDED_SIZE/increase it to a full cacheline

Reasoning behind LWLOCK_PADDED_SIZE/increase it to a full cacheline

From
Andres Freund
Date:
Hi,

Currently LWLOCK_PADDED_SIZE is defined as:

/** All the LWLock structs are allocated as an array in shared memory.* (LWLockIds are indexes into the array.) We
forcethe array stride to* be a power of 2, which saves a few cycles in indexing, but more* importantly also ensures
thatindividual LWLocks don't cross cache line* boundaries.  This reduces cache contention problems, especially on AMD*
Opterons. (Of course, we have to also ensure that the array start* address is suitably aligned.)** LWLock is between 16
and32 bytes on all known platforms, so these two* cases are sufficient.*/
 
#define LWLOCK_PADDED_SIZE      (sizeof(LWLock) <= 16 ? 16 : 32)

typedef union LWLockPadded
{       LWLock          lock;       char                    pad[LWLOCK_PADDED_SIZE];
} LWLockPadded;

So, what we do is we guarantee that LWLocks are aligned to 16 or 32byte
boundaries. That means that on x86-64 (64byte cachelines, 24bytes
unpadded lwlock) two lwlocks share a cacheline. As struct LWLock
contains a spinlock and important lwlocks are often besides each other,
that strikes me as a bad idea.
Take for example the partitioned buffer mapping lock. This coding
essentially reduces the effect of partitioning by half in a readonly
workload where the only contention is the LWLock's spinlock itself.

Does anybody remember why this is done that way? The padding itself was
introduced in dc06734a .

In my benchmarks changing the padding to 64byte increases performance in
workloads with contended lwlocks considerably. 11% for a workload where
the buffer mapping lock is the major contention point, on a 2 socket
system.
Unfortunately increasing it to CACHE_LINE_SIZE/128 results in only a
2-3% increase.

Comments?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Reasoning behind LWLOCK_PADDED_SIZE/increase it to a full cacheline

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> So, what we do is we guarantee that LWLocks are aligned to 16 or 32byte
> boundaries. That means that on x86-64 (64byte cachelines, 24bytes
> unpadded lwlock) two lwlocks share a cacheline.

Yup.

> In my benchmarks changing the padding to 64byte increases performance in
> workloads with contended lwlocks considerably.

At a huge cost in RAM.  Remember we make two LWLocks per shared buffer.

I think that rather than using a blunt instrument like that, we ought to
see if we can identify pairs of hot LWLocks and make sure they're not
adjacent.
        regards, tom lane



Re: Reasoning behind LWLOCK_PADDED_SIZE/increase it to a full cacheline

From
Andres Freund
Date:
On 2013-09-24 12:39:39 +0200, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > So, what we do is we guarantee that LWLocks are aligned to 16 or 32byte
> > boundaries. That means that on x86-64 (64byte cachelines, 24bytes
> > unpadded lwlock) two lwlocks share a cacheline.

> > In my benchmarks changing the padding to 64byte increases performance in
> > workloads with contended lwlocks considerably.
> 
> At a huge cost in RAM.  Remember we make two LWLocks per shared buffer.

> I think that rather than using a blunt instrument like that, we ought to
> see if we can identify pairs of hot LWLocks and make sure they're not
> adjacent.

That's a good point. What about making all but the shared buffer lwlocks
64bytes? It seems hard to analyze the interactions between all the locks
and keep it maintained.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Reasoning behind LWLOCK_PADDED_SIZE/increase it to a full cacheline

From
Robert Haas
Date:
On Tue, Sep 24, 2013 at 6:48 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-09-24 12:39:39 +0200, Tom Lane wrote:
>> Andres Freund <andres@2ndquadrant.com> writes:
>> > So, what we do is we guarantee that LWLocks are aligned to 16 or 32byte
>> > boundaries. That means that on x86-64 (64byte cachelines, 24bytes
>> > unpadded lwlock) two lwlocks share a cacheline.
>
>> > In my benchmarks changing the padding to 64byte increases performance in
>> > workloads with contended lwlocks considerably.
>>
>> At a huge cost in RAM.  Remember we make two LWLocks per shared buffer.
>
>> I think that rather than using a blunt instrument like that, we ought to
>> see if we can identify pairs of hot LWLocks and make sure they're not
>> adjacent.
>
> That's a good point. What about making all but the shared buffer lwlocks
> 64bytes? It seems hard to analyze the interactions between all the locks
> and keep it maintained.

I think somebody had a patch a few years ago that made it so that the
LWLocks didn't have to be in a single array, but could instead be
anywhere in shared memory.  Internally, lwlock.c held onto LWLock
pointers instead of LWLockIds.  That idea seems like it might be worth
revisiting, in terms of giving us more options as to how LWLocks can
be laid out in shared memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company