Re: anole: assorted stability problems - Mailing list pgsql-hackers

From Robert Haas
Subject Re: anole: assorted stability problems
Date
Msg-id CA+TgmoaaeRv=1120hQdTjF++Sd4G2zMA-U2-UKzJMD1vMF+CWg@mail.gmail.com
Whole thread Raw
In response to Re: anole: assorted stability problems  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: anole: assorted stability problems  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Jun 28, 2015 at 9:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> That sucks.  It was easy to see that the old fallback barrier
>> implementation wasn't re-entrant, but this one should be.  And now
>> that I look at it again, doesn't the failure message indicate that's
>> not the problem anyway?
>
>> ! PANIC:  stuck spinlock (c00000000d6f4140) detected at lwlock.c:816
>> ! PANIC:  stuck spinlock (c00000000d72f6e0) detected at lwlock.c:770
>
> I was assuming that a leaky memory barrier was allowing the spinlock
> state to become inconsistent, or at least to be perceived as inconsistent.
> But I'm not too clear on how the barrier changes you and Andres have been
> making have affected the spinlock code.

For the most part, they haven't.  Andres did a bunch of work to add
atomics support, and overhauled the barrier implementation that I
committed to 9.2 along the way.  But that had minimal impact on
s_lock.h.

What we did do that touched s_lock.h was attempt to ensure that
SpinLockAcquire() and SpinLockRelease() function as compiler barriers,
so that it should no longer be necessary to litter the code with
"volatile" in every function that uses those.  It is possible that
this could be broken on HP-UX.  If _Asm_sched_fence() doesn't
constraint the compiler appropriately, that could explain the problems
we're seeing here.  But we're not the only one using that incantation,
so I'm left scratching my head.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: drop/truncate table sucks for large values of shared buffers
Next
From: Amit Langote
Date:
Subject: Adjust errorcode in background worker code