Re: LogwrtResult contended spinlock - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: LogwrtResult contended spinlock
Date
Msg-id 20210202231919.GA23288@alvherre.pgsql
Whole thread Raw
In response to Re: LogwrtResult contended spinlock  (Andres Freund <andres@anarazel.de>)
Responses Re: LogwrtResult contended spinlock  (Andres Freund <andres@anarazel.de>)
Re: LogwrtResult contended spinlock  (Jaime Casanova <jcasanov@systemguards.com.ec>)
List pgsql-hackers
Hello,

So I addressed about half of your comments in this version merely by
fixing silly bugs.  The problem I had which I described as
"synchronization fails" was one of those silly bugs.

So in further thinking, it seems simpler to make only LogwrtResult
atomic, and leave LogwrtRqst as currently, using the spinlock.  This
should solve the contention problem we saw at the customer (but I've
asked Jaime very nicely to do a test run, if possible, to confirm).

For things like logical replication, which call GetFlushRecPtr() very
frequently (causing the spinlock issue we saw) it is good, because we're
no longer hitting the spinlock at all in that case.

I have another (pretty mechanical) patch that renames LogwrtResult.Write
to LogWriteResult and LogwrtResult.Flush to LogFlushResult.  That more
clearly shows that we're no longer updating them on unison.  Didn't want
to attach here because I didn't rebase on current one.  But it seems
logical: there's no longer any point in doing struct assignment, which
is the only thing that stuff was good for.


On 2021-Jan-29, Andres Freund wrote:

> > @@ -1169,6 +1169,8 @@ XLogInsertRecord(XLogRecData *rdata,
> >      {
> >          /* advance global request to include new block(s)
> >          */
> >          pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, EndPos);
> > +        pg_memory_barrier();
> 
> That's not really useful - the path that actually updates already
> implies a barrier. It'd probably be better to add a barrier to a "never
> executed cmpxchg" fastpath.

Got it.  Do you mean in pg_atomic_monotonic_advance_u64() itself?  I'm
not sure which is the nicer semantics.  (If it's got to be at the
caller, then we'll need to return a boolean from there, which sounds
worse.)

> > @@ -2905,6 +2909,7 @@ XLogFlush(XLogRecPtr record)
> >          WriteRqstPtr = pg_atomic_read_u64(&XLogCtl->LogwrtRqst.Write);
> >          LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
> >          LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
> > +        pg_read_barrier();
> 
> I'm not sure you really can get away with just a read barrier in these
> places. We can't e.g. have later updates to other shared memory
> variables be "moved" to before the barrier (which a read barrier
> allows).

Ah, that makes sense.

I have not really studied the barrier locations terribly closely in this
version of the patch.  It probably misses some (eg. in GetFlushRecPtr
and GetXLogWriteRecPtr).  It is passing the tests for me, but alone
that's probably not enough.  I'm gonna try and study the generated
assembly and see if I can make sense of things ...

-- 
Álvaro Herrera                            39°49'30"S 73°17'W

Attachment

pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: new heapcheck contrib module
Next
From: Tom Lane
Date:
Subject: Re: Recording foreign key relationships for the system catalogs