Re: Still something fishy in the fastpath lock stuff - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Still something fishy in the fastpath lock stuff |
Date | |
Msg-id | 53328B5D.2020706@vmware.com Whole thread Raw |
In response to | Re: Still something fishy in the fastpath lock stuff (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On 03/26/2014 06:59 AM, Robert Haas wrote: > On Tue, Mar 25, 2014 at 11:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> I really think we >>> should change that rule; it leads to ugly code, and bugs. >> >> No doubt, but are we prepared to believe that we have working "compiler >> barriers" other than volatile? (Which, I will remind you, is in the C >> standard. Unlike "compiler barriers".) > > I'm prepared to believe that it would take some time to be certain > that we've got that right on all compilers we support. The most > common ones are easily dealt with, I think, but it might be harder to > verify the behavior on more obscure compilers. But I'd rather deal > with bullet-proofing SpinLockAcquire() and SpinLockRelease() *in > general* than deal with bullet-proofing every call site individually, > which is what we have to do right now. +1. To support porting to new compilers, we can fall back to e.g calling a dummy function through a function pointer, if we don't have a proper implementation. Aside from the correctness issue: A while ago, while working on the xloginsert slots stuff, I looked at the assembler that gcc generated for ReserverXLogInsertLocation(). That function is basically: SpinLockAcquire() <modify and read a few variables in shared memory> SpinLockRelease() <fairly expensive calculationsbased on values read> Gcc decided to move the release of the lock so that it became: SpinLockAcquire() <modify and read a few variables in shared memory> <fairly expensive calculations based on valuesread> SpinLockRelease() I went through some effort while writing the patch to make sure the spinlock is held for as short duration as possible. But gcc undid that to some extent :-(. I tried to put a compiler barrier there and run some short performance tests, but it didn't make any noticeable difference. So it doesn't seem matter in practice - maybe the CPU reorders things anyway, or the calculations are not expensive enough to matter after all. I just looked at the assembler generated for LWLockAcquire, and a similar thing is happening there. The code is: ... 641 /* We are done updating shared state of the lock itself. */ 642 SpinLockRelease(&lock->mutex); 643 644 TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(l), T_ID(l), mode); 645 646 /* Add lock to list of locks held by this backend */ 647 held_lwlocks[num_held_lwlocks++] = l;... gcc reordered part of the "held_lwlocks" assignment after releasing the spinlock: .L21:.loc 1 647 0movslq num_held_lwlocks(%rip), %raxaddq $16, %r12 .LVL23:.loc 1 652 0testl %ebx, %ebx.loc 1 642 0movb $0, (%r14) <--- SpinLockRelease.loc 1 652 0leal -1(%rbx),%r13d.loc 1 647 0leal 1(%rax), %edxmovq %r14, held_lwlocks(,%rax,8) .LVL24:movl %edx, num_held_lwlocks(%rip) The held_lwlocks assignment was deliberately put outside the spinlock-protected section, to minimize the time the spinlock is held, but the compiler moved some of it back in: the basic block .L21. A compiler barrier would avoid that kind of reordering. Not using volatile would also allow the compiler to reorder and optimize the instructions *within* the spinlock-protected block, which might shave a few instructions while a spinlock is held. Granted, spinlock-protected blocks are small so there isn't much room for optimization, but still. - Heikki
pgsql-hackers by date: