Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Improving connection scalability: GetSnapshotData()
Date
Msg-id 20200408213755.xvbv7rj27yfwhs6a@alap3.anarazel.de
Whole thread Raw
In response to Re: Improving connection scalability: GetSnapshotData()  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2020-04-08 09:24:13 -0400, Robert Haas wrote:
> On Tue, Apr 7, 2020 at 4:27 PM Andres Freund <andres@anarazel.de> wrote:
> > The main reason is that we want to be able to cheaply check the current
> > state of the variables (mostly when checking a backend's own state). We
> > can't access the "dense" ones without holding a lock, but we e.g. don't
> > want to make ProcArrayEndTransactionInternal() take a lock just to check
> > if vacuumFlags is set.
> >
> > It turns out to also be good for performance to have the copy for
> > another reason: The "dense" arrays share cachelines with other
> > backends. That's worth it because it allows to make GetSnapshotData(),
> > by far the most frequent operation, touch fewer cache lines. But it also
> > means that it's more likely that a backend's "dense" array entry isn't
> > in a local cpu cache (it'll be pulled out of there when modified in
> > another backend). In many cases we don't need the shared entry at commit
> > etc time though, we just need to check if it is set - and most of the
> > time it won't be.  The local entry allows to do that cheaply.
> >
> > Basically it makes sense to access the PGPROC variable when checking a
> > single backend's data, especially when we have to look at the PGPROC for
> > other reasons already.  It makes sense to look at the "dense" arrays if
> > we need to look at many / most entries, because we then benefit from the
> > reduced indirection and better cross-process cacheability.
> 
> That's a good explanation. I think it should be in the comments or a
> README somewhere.

I had a briefer version in the PROC_HDR comment. I've just expanded it
to:
 *
 * The denser separate arrays are beneficial for three main reasons: First, to
 * allow for as tight loops accessing the data as possible. Second, to prevent
 * updates of frequently changing data (e.g. xmin) from invalidating
 * cachelines also containing less frequently changing data (e.g. xid,
 * vacuumFlags). Third to condense frequently accessed data into as few
 * cachelines as possible.
 *
 * There are two main reasons to have the data mirrored between these dense
 * arrays and PGPROC. First, as explained above, a PGPROC's array entries can
 * only be accessed with either ProcArrayLock or XidGenLock held, whereas the
 * PGPROC entries do not require that (obviously there may still be locking
 * requirements around the individual field, separate from the concerns
 * here). That is particularly important for a backend to efficiently checks
 * it own values, which it often can safely do without locking.  Second, the
 * PGPROC fields allow to avoid unnecessary accesses and modification to the
 * dense arrays. A backend's own PGPROC is more likely to be in a local cache,
 * whereas the cachelines for the dense array will be modified by other
 * backends (often removing it from the cache for other cores/sockets). At
 * commit/abort time a check of the PGPROC value can avoid accessing/dirtying
 * the corresponding array value.
 *
 * Basically it makes sense to access the PGPROC variable when checking a
 * single backend's data, especially when already looking at the PGPROC for
 * other reasons already.  It makes sense to look at the "dense" arrays if we
 * need to look at many / most entries, because we then benefit from the
 * reduced indirection and better cross-process cache-ability.
 *
 * When entering a PGPROC for 2PC transactions with ProcArrayAdd(), the data
 * in the dense arrays is initialized from the PGPROC while it already holds

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: explain HashAggregate to report bucket and memory stats
Next
From: Thomas Munro
Date:
Subject: Re: WIP: WAL prefetch (another approach)