Re: [PERFORM] Cpu usage 100% on slave. s_lock problem. - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
Date
Msg-id 20131001110835.GE2670970@alap2.anarazel.de
Whole thread Raw
In response to Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.  (Ants Aasma <ants@cybertec.at>)
Responses Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.  (Ants Aasma <ants@cybertec.at>)
List pgsql-hackers
On 2013-10-01 03:51:50 +0300, Ants Aasma wrote:
> On Sat, Sep 28, 2013 at 12:53 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > What confuses me is that pg_read_barrier() is just a compiler barrier on
> > x86[-64] in barrier.h. According to my knowledge it needs to be an
> > lfence or the full barrier?
> > The linked papers from Paul McKenney - which are a great read - seem to
> > agree?
> 
> x86 provides a pretty strong memory model, the only (relevant) allowed
> reordering is that stores may be delayed beyond subsequent loads. A
> simple and functionally accurate model of this is to ignore all caches
> and think of the system as a bunch of CPU's accessing the main memory
> through a shared bus, but each CPU has a small buffer that stores
> writes done by that CPU. Intel and AMD have performed feats of black
> magic in the memory pipelines that this isn't a good performance
> model, but for correctness it is valid. The exceptions are few
> unordered memory instructions that you have to specifically ask for
> and non-writeback memory that concerns the kernel. (See section 8.2 in
> [1] for details) The note by McKenney stating that lfence is required
> for a read barrier is for those unordered instructions. I don't think
> it's necessary in PostgreSQL.

I am actually referring to his x86 description where he states smp_rmb()
is defined as lock xadd;. But I've since done some research and that was
removed for most x86 platforms after intel and AMD relaxed the
requirements to match the actually implemented reality.

For reference:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b6c7347fffa655a3000d9d41640d222c19fc3065

> As for the specific patch being discussed here. The read barrier is in
> the wrong place and with the wrong comment, and the write side is
> assuming that SpinLockAcquire() is a write barrier, which it isn't
> documented to do at the moment.

Lots of things we do pretty much already assume it is one - and I have a
hard time imagining any spinlock implementation that isn't a write
barrier. In fact it's the only reason we use spinlocks in quite some
places.
What we are missing is that it's not guaranteed to be a compiler barrier
on all platforms :(. But that's imo a bug.

> The correct way to think of this is
> that StartupXLOG() does a bunch of state modifications and then
> advertises the fact that it's done by setting
> xlogctl->SharedRecoveryInProgress = false; The state modifications
> should better be visible to anyone seeing that last write, so you need
> one write barrier between the state modifications and setting the
> flag.

SpinLockAcquire() should provide that.

> On the other side, after reading the flag as false in
> RecoveryInProgress() the state modified by StartupXLOG() may be
> investigated, those loads better not happen before we read the flag.

Agreed.

> So we need a read barrier somewhere *after* reading the flag in
> RecoveryInProgress() and reading the shared memory structures, and in
> theory a full barrier if we are going to be writing data. In practice
> x86 is covered thanks to it's memory model, Power is covered thanks to
> the control dependency and ARM would need a read barrier, but I don't
> think any real ARM CPU does speculative stores as that would be
> insane.

Does there necessarily have to be a "visible" control dependency?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Cmpact commits and changeset extraction
Next
From: Andres Freund
Date:
Subject: Re: Freezing without write I/O