Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Improving connection scalability: GetSnapshotData()
Date
Msg-id 20200407202721.crr2mhyyorbhyfml@alap3.anarazel.de
Whole thread Raw
In response to Re: Improving connection scalability: GetSnapshotData()  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Improving connection scalability: GetSnapshotData()
List pgsql-hackers
Hi,

On 2020-04-07 15:26:36 -0400, Robert Haas wrote:
> 0008 -
> 
> Here again, I greatly dislike putting Copy in the name. It makes
> little sense to pretend that either is the original and the other is
> the copy. You just have the same data in two places. If one of them is
> more authoritative, the place to explain that is in the comments, not
> by elongating the structure member name and supposing anyone will be
> able to make something of that.

Ok.



> 0009, 0010 -
> 
> I think you've got this whole series of things divided up too finely.
> Like, 0005 feels like the meat of it, and that has a bunch of things
> in it that could plausible be separated out as separate commits. 0007
> also seems to do more than one kind of thing (see my comment regarding
> moving some of that into 0006). But whacking everything around like a
> crazy man in 0005 and a little more in 0007 and then doing the
> following cleanup in these little tiny steps seems pretty lame.
> Separating 0009 from 0010 is maybe the clearest example of that, but
> IMHO it's pretty unclear why both of these shouldn't be merged with
> 0008.

I found it a *lot* easier to review / evolve them this way. I e.g. had
an earlier version in which the subxid part of the change worked
substantially differently (it tried to elide the overflowed bool, by
definining -1 as the indicator for overflows), and it'd been way harder
to change that if I didn't have a patch with *just* the subxid changes.

I'd not push them separated by time, but I do think it'd make sense to
push them as separate commits. I think it's easier to review them in
case of a bug in a separate area.


> My comments on the Copy naming apply here as well. I am also starting
> to wonder why exactly we need two copies of all this stuff. Perhaps
> I've just failed to absorb the idea for having read the patch too
> briefly, but I think that we need to make sure that it's super-clear
> why we're doing that. If we just needed it for one field because
> $REASONS, that would be one thing, but if we need it for all of them
> then there must be some underlying principle here that needs a good
> explanation in an easy-to-find and centrally located place.

The main reason is that we want to be able to cheaply check the current
state of the variables (mostly when checking a backend's own state). We
can't access the "dense" ones without holding a lock, but we e.g. don't
want to make ProcArrayEndTransactionInternal() take a lock just to check
if vacuumFlags is set.

It turns out to also be good for performance to have the copy for
another reason: The "dense" arrays share cachelines with other
backends. That's worth it because it allows to make GetSnapshotData(),
by far the most frequent operation, touch fewer cache lines. But it also
means that it's more likely that a backend's "dense" array entry isn't
in a local cpu cache (it'll be pulled out of there when modified in
another backend). In many cases we don't need the shared entry at commit
etc time though, we just need to check if it is set - and most of the
time it won't be.  The local entry allows to do that cheaply.

Basically it makes sense to access the PGPROC variable when checking a
single backend's data, especially when we have to look at the PGPROC for
other reasons already.  It makes sense to look at the "dense" arrays if
we need to look at many / most entries, because we then benefit from the
reduced indirection and better cross-process cacheability.


> 0011 -
> 
> + * Number of top-level transactions that completed in some form since the
> + * start of the server. This currently is solely used to check whether
> + * GetSnapshotData() needs to recompute the contents of the snapshot, or
> + * not. There are likely other users of this.  Always above 1.
> 
> Does it only count XID-bearing transactions? If so, best mention that.

Oh, good point.


> +GetSnapshotDataFillTooOld(Snapshot snapshot)
> 
> Uh... no clue what's going on here. Granted the code had no comments
> in the old place either, so I guess it's not worse, but even the name
> of the new function is pretty incomprehensible.

It fills the old_snapshot_threshold fields of a Snapshot.


> + * It is safe to re-enter the snapshot's xmin. This can't cause xmin to go
> 
> I know what it means to re-enter a building, but I don't know what it
> means to re-enter the snapshot's xmin.

Re-entering it into the procarray, thereby preventing rows that the
snapshot could see from being removed.

> This whole comment seems a bit murky.

How about:
    /*
     * If the current xactCompletionCount is still the same as it was at the
     * time the snapshot was built, we can be sure that rebuilding the
     * contents of the snapshot the hard way would result in the same snapshot
     * contents:
     *
     * As explained in transam/README, the set of xids considered running by
     * GetSnapshotData() cannot change while ProcArrayLock is held. Snapshot
     * contents only depend on transactions with xids and xactCompletionCount
     * is incremented whenever a transaction with an xid finishes (while
     * holding ProcArrayLock) exclusively). Thus the xactCompletionCount check
     * ensures we would detect if the snapshot would have changed.
     *
     * As the snapshot contents are the same as it was before, it is is safe
     * to re-enter the snapshot's xmin into the PGPROC array. None of the rows
     * visible under the snapshot could already have been removed (that'd
     * require the set of running transactions to change) and it fulfills the
     * requirement that concurrent GetSnapshotData() calls yield the same
     * xmin.
     */

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Floris Van Nee
Date:
Subject: RE: Index Skip Scan
Next
From: Alvaro Herrera
Date:
Subject: Re: FETCH FIRST clause WITH TIES option