Re: Reducing ClogControlLock contention - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Reducing ClogControlLock contention
Date
Msg-id CANP8+jLJ2nX+O7AbxF-vYHYuxuX91DXRO2_P9cyrPVFV=tBrrw@mail.gmail.com
Whole thread Raw
In response to Re: Reducing ClogControlLock contention  (Andres Freund <andres@anarazel.de>)
Responses Re: Reducing ClogControlLock contention  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 22 August 2015 at 15:14, Andres Freund <andres@anarazel.de> wrote:
 
On 2015-06-30 08:02:25 +0100, Simon Riggs wrote:
> Proposal for improving this is to acquire the ClogControlLock in Shared
> mode, if possible.

I find that rather scary. That requires for all read and write paths in
clog.c and slru.c to only ever read memory locations once. Otherwise
those reads may not get the same results. That'd mean we'd need to add
indirections via volatile to all reads/writes. It also requires that we
never read in 4 byte quantities.

There is is a very specific case in which this is possible. We already allow writes to data structures in the transaction manager without locks *in certain cases* and this is all that is proposed here. Nothing scary about doing something we already do, as long as we get it right.
 
> This is safe because people checking visibility of an xid must always run
> TransactionIdIsInProgress() first to avoid race conditions, which will
> always return true for the transaction we are currently committing. As a
> result, we never get concurrent access to the same bits in clog, which
> would require a barrier.

I don't think that's really sufficient. There's a bunch of callers do
lookups without such checks, e.g. in heapam.c. It might be safe due to
other circumstances, but at the very least this is a significant but
sublte API revision.

I've checked the call sites you mention and they refer to tests made AFTER we know have waited for the xid to complete via the lock manager. So as of now, there are no callers of TransactionIdGetStatus() that have not already confirmed that the xid is no longer active in the lock manager or the procarray. Since we set clog before touching lock manager or procarray we can be certain there is no concurrent reads and writes.
 
TransactionIdSetPageStatus() calls TransactionIdSetStatusBit(), which
writes an 8 byte variable (the lsn). That's not safe.

Agreed, thanks for spotting that.

I see this as the biggest problem to overcome with this patch.

We write WAL in pages, so we don't need to store the low order bits (varies according to WAL page size). We seldom use the higher order bits, since it takes a while to go thru (8192 * INT_MAX) = 32PB of WAL. So it seems like we can have a base LSN for a whole clog page, plus 4-byte LSN offsets. That way we can make the reads and writes atomic on all platforms. All of that can be hidden in clog.c in macros, so low impact, modular code.
 
A patch like this will need far more changes. Every read and write from
a page has to be guaranteed to only be done once, otherwise you can get
'effectively torn' values.

That is, you can't just do
static void
TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
...
        char       *byteptr;
        char        byteval;
...
        /* note this assumes exclusive access to the clog page */
        byteval = *byteptr;
        byteval &= ~(((1 << CLOG_BITS_PER_XACT) - 1) << bshift);
        byteval |= (status << bshift);
        *byteptr = byteval;
...

the compiler is entirely free to "optimize" away the byteval variable
and do all these masking operations on memory! It can intermittenly
write temporary values to byteval, because e.g. the register pressure is
too high.

To actually rely on single-copy-atomicity you have to enforce that these
reads/writes can only happen. Leavout out some possible macro
indirection that'd have to look like
        byteval = (volatile char *) byteptr;
        ...
        *(volatile char *) byteptr = byteval;
some for TransactionIdGetStatus(), without the write side obviously.

Seems doable.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Reducing ClogControlLock contention
Next
From: Fabien COELHO
Date:
Subject: Re: Commitfest remaining "Needs Review" items