Re: Still something fishy in the fastpath lock stuff - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Still something fishy in the fastpath lock stuff
Date
Msg-id 53328B5D.2020706@vmware.com
Whole thread Raw
In response to Re: Still something fishy in the fastpath lock stuff  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 03/26/2014 06:59 AM, Robert Haas wrote:
> On Tue, Mar 25, 2014 at 11:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> I really think we
>>> should change that rule; it leads to ugly code, and bugs.
>>
>> No doubt, but are we prepared to believe that we have working "compiler
>> barriers" other than volatile?  (Which, I will remind you, is in the C
>> standard.  Unlike "compiler barriers".)
>
> I'm prepared to believe that it would take some time to be certain
> that we've got that right on all compilers we support.  The most
> common ones are easily dealt with, I think, but it might be harder to
> verify the behavior on more obscure compilers.  But I'd rather deal
> with bullet-proofing SpinLockAcquire() and SpinLockRelease() *in
> general* than deal with bullet-proofing every call site individually,
> which is what we have to do right now.

+1. To support porting to new compilers, we can fall back to e.g calling 
a dummy function through a function pointer, if we don't have a proper 
implementation.

Aside from the correctness issue: A while ago, while working on the 
xloginsert slots stuff, I looked at the assembler that gcc generated for 
ReserverXLogInsertLocation(). That function is basically:
    SpinLockAcquire()    <modify and read a few variables in shared memory>    SpinLockRelease()    <fairly expensive
calculationsbased on values read>
 

Gcc decided to move the release of the lock so that it became:
    SpinLockAcquire()    <modify and read a few variables in shared memory>    <fairly expensive calculations based on
valuesread>    SpinLockRelease()
 

I went through some effort while writing the patch to make sure the 
spinlock is held for as short duration as possible. But gcc undid that 
to some extent :-(.

I tried to put a compiler barrier there and run some short performance 
tests, but it didn't make any noticeable difference. So it doesn't seem 
matter in practice - maybe the CPU reorders things anyway, or the 
calculations are not expensive enough to matter after all.

I just looked at the assembler generated for LWLockAcquire, and a 
similar thing is happening there. The code is:
...
641    /* We are done updating shared state of the lock itself. */
642    SpinLockRelease(&lock->mutex);
643
644    TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(l), T_ID(l), mode);
645
646    /* Add lock to list of locks held by this backend */
647    held_lwlocks[num_held_lwlocks++] = l;...

gcc reordered part of the "held_lwlocks" assignment after releasing the 
spinlock:

.L21:.loc 1 647 0movslq    num_held_lwlocks(%rip), %raxaddq    $16, %r12
.LVL23:.loc 1 652 0testl    %ebx, %ebx.loc 1 642 0movb    $0, (%r14)    <--- SpinLockRelease.loc 1 652 0leal
-1(%rbx),%r13d.loc 1 647 0leal    1(%rax), %edxmovq    %r14, held_lwlocks(,%rax,8)
 
.LVL24:movl    %edx, num_held_lwlocks(%rip)

The held_lwlocks assignment was deliberately put outside the 
spinlock-protected section, to minimize the time the spinlock is held, 
but the compiler moved some of it back in: the basic block .L21.

A compiler barrier would avoid that kind of reordering. Not using 
volatile would also allow the compiler to reorder and optimize the 
instructions *within* the spinlock-protected block, which might shave a 
few instructions while a spinlock is held. Granted, spinlock-protected 
blocks are small so there isn't much room for optimization, but still.

- Heikki



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: "Conditional jump or move depends on uninitialised value(s)" within tsginidx.c
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: inherit support for foreign tables