Re: Reduce ProcArrayLock contention - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Reduce ProcArrayLock contention
Date
Msg-id 20150820101940.GA5070@alap3.anarazel.de
Whole thread Raw
In response to Re: Reduce ProcArrayLock contention  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Reduce ProcArrayLock contention  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 2015-08-20 15:38:36 +0530, Amit Kapila wrote:
> On Wed, Aug 19, 2015 at 9:09 PM, Andres Freund <andres@anarazel.de> wrote:
> > I spent some time today reviewing the commited patch. So far my only
> > major complaint is that I think the comments are only insufficiently
> > documenting the approach taken:
> > Stuff like avoiding ABA type problems by clearling the list entirely and
> > it being impossible that entries end up on the list too early absolutely
> > needs to be documented explicitly.
> >
> 
> I think more comments can be added to explain such behaviour if it is
> not clear via looking at current code and comments.

It's not mentioned at all, so yes.

> I think you are right and here we need to use something like what is
> suggested below by you.  Originally the code was similar to what you
> have written below, but it was using a different (new) variable to achieve
> what you have achieved with lwWaiting and to avoid the use of new
> variable the code has been refactored in current way.  I think we should
> do this change (I can write a patch) unless Robert feels otherwise.

I think we can just rename lwWaiting to something more generic.


> >    Consider what happens if such a follower enqueues in another
> >    transaction. It is not, as far as I could find out, guaranteed on all
> >    types of cpus that a third backend can actually see nextClearXidElem
> >    as INVALID_PGPROCNO. That'd likely require SMT/HT cores and multiple
> >    sockets. If the write to nextClearXidElem is entered into the local
> >    store buffer (leader #1) a hyper-threaded process (leader #2) can
> >    possibly see it (store forwarding) while another backend doesn't
> >    yet.
> >
> >    I think this is very unlikely to be an actual problem due to
> >    independent barriers until enqueued again, but I don't want to rely
> >    on it undocumentedly. It seems safer to replace
> >    +            wakeidx = pg_atomic_read_u32(&proc->nextClearXidElem);
> >    +            pg_atomic_write_u32(&proc->nextClearXidElem,
> INVALID_PGPROCNO);
> >    with a pg_atomic_exchange_u32().
> >
> 
> I didn't follow this point, if we ensure that follower can never return
> before leader wakes it up, then why it will be a problem to update
> nextClearXidElem like above.

Because it doesn't generally enforce that *other* backends have seen the
write as there's no memory barrier.


> > +/*
> > + * ProcArrayGroupClearXid -- group XID clearing
> > + *
> > + * When we cannot immediately acquire ProcArrayLock in exclusive mode at
> > + * commit time, add ourselves to a list of processes that need their XIDs
> > + * cleared.  The first process to add itself to the list will acquire
> > + * ProcArrayLock in exclusive mode and perform
> ProcArrayEndTransactionInternal
> > + * on behalf of all group members.  This avoids a great deal of context
> > + * switching when many processes are trying to commit at once, since the
> lock
> > + * only needs to be handed from the last share-locker to one process
> waiting
> > + * for the exclusive lock, rather than to each one in turn.
> > + */
> > +static void
> > +ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
> > +{
> >
> > This comment, in my opinion, is rather misleading. If you have a
> > workload that primarily is slowed down due to transaction commits, this
> > patch doesn't actually change the number of context switches very
> > much. Previously all backends enqueued in the lwlock and got woken up
> > one-by-one. Safe backends 'jumping' the queue while a lock has been
> > released but the woken up backend doesn't yet run, there'll be exactly
> > as many context switches as today.
> >
> 
> I think this is debatable as in previous mechanism all the backends
> one-by-one clears their transaction id's (which is nothing but context
> switching) which lead to contention not only with read lockers
> but Write lockers as well.

Huh? You can benchmark it, there's barely any change in the number of
context switches.

I am *not* saying that there's no benefit to the patch, I am saying that
context switches are the wrong explanation. Reduced contention (due to
shorter lock holding times, fewer cacheline moves etc.) is the reason
this is beneficial.


> > I don't think it's a good idea to use the variable name in PROC_HDR and
> > PGPROC, it's confusing.

> What do you mean by this, are you not happy with variable name?

Yes. I think it's a bad idea to have the same variable name in PROC_HDR
and PGPROC.

struct PGPROC
{
.../* Support for group XID clearing. */volatile pg_atomic_uint32    nextClearXidElem;
...
}

typedef struct PROC_HDR
{
.../* First pgproc waiting for group XID clear */volatile pg_atomic_uint32 nextClearXidElem;
...
}

PROC_HDR's variable imo isn't well named.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Reduce ProcArrayLock contention
Next
From: Magnus Hagander
Date:
Subject: Re: Supporting fallback RADIUS server(s)