Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Improving connection scalability: GetSnapshotData()
Date
Msg-id 20200407192453.74ouppts7v27ncov@alap3.anarazel.de
Whole thread Raw
In response to Re: Improving connection scalability: GetSnapshotData()  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Improving connection scalability: GetSnapshotData()  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2020-04-07 14:28:09 -0400, Robert Haas wrote:
> More review, since it sounds like you like it:
>
> 0006 - Boring. But I'd probably make this move both xmin and xid back,
> with related comment changes; see also next comment.
>
> 0007 -
>
> + TransactionId xidCopy; /* this backend's xid, a copy of this proc's
> +    ProcGlobal->xids[] entry. */
>
> Can we please NOT put Copy into the name like that? Pretty please?

Do you have a suggested naming scheme? Something indicating that it's
not the only place that needs to be updated?


> + int pgxactoff; /* offset into various ProcGlobal-> arrays
> + * NB: can change any time unless locks held!
> + */
>
> I'm going to add the helpful comment "NB: can change any time unless
> locks held!" to every data structure in the backend that is in shared
> memory and not immutable. No need, of course, to mention WHICH
> locks...

I think it's more on-point here, because we need to hold either of the
locks* even, for changes to a backend's own status that one reasonably
could expect would be safe to at least inspect. E.g looking at
ProcGlobal->xids[MyProc->pgxactoff]
doesn't look suspicious, but could very well return another backends
xid, if neither ProcArrayLock nor XidGenLock is held (because a
ProcArrayRemove() could have changed pgxactoff if a previous entry was
removed).

*see comment at PROC_HDR:

 *
 * Adding/Removing an entry into the procarray requires holding *both*
 * ProcArrayLock and XidGenLock in exclusive mode (in that order). Both are
 * needed because the dense arrays (see below) are accessed from
 * GetNewTransactionId() and GetSnapshotData(), and we don't want to add
 * further contention by both using one lock. Adding/Removing a procarray
 * entry is much less frequent.
 */
typedef struct PROC_HDR
{
    /* Array of PGPROC structures (not including dummies for prepared txns) */
    PGPROC       *allProcs;


    /*
     * Arrays with per-backend information that is hotly accessed, indexed by
     * PGPROC->pgxactoff. These are in separate arrays for three reasons:
     * First, to allow for as tight loops accessing the data as
     * possible. Second, to prevent updates of frequently changing data from
     * invalidating cachelines shared with less frequently changing
     * data. Third to condense frequently accessed data into as few cachelines
     * as possible.
     *
     * When entering a PGPROC for 2PC transactions with ProcArrayAdd(), those
     * copies are used to provide the contents of the dense data, and will be
     * transferred by ProcArrayAdd() while it already holds ProcArrayLock.
     */

there's also

 * The various *Copy fields are copies of the data in ProcGlobal arrays that
 * can be accessed without holding ProcArrayLock / XidGenLock (see PROC_HDR
 * comments).


I had a more explicit warning/explanation about the dangers of accessing
the arrays without locks, but apparently went to far when reducing
duplicated comments.


> On a related note, PROC_HDR really, really, really needs comments
> explaining the locking regimen for the new xids field.


I'll expand the above, in particular highlighting the danger of
pgxactoff changing.


> + ProcGlobal->xids[pgxactoff] = InvalidTransactionId;
>
> Apparently this array is not dense in the sense that it excludes
> unused slots, but comments elsewhere don't seem to entirely agree.

What do you mean with "unused slots"? Backends that committed?

Dense is intended to describe that the arrays only contain currently
"live" entries. I.e. that the first procArray->numProcs entries in each
array have the data for all procs (including prepared xacts) that are
"connected".  This is extending the concept that already existed for
procArray->pgprocnos.

Wheras the PGPROC/PGXACT arrays have "unused" entries interspersed.

This is what previously lead to the slow loop in GetSnapshotData(),
where we had to iterate over PGXACTs over an indirection in
procArray->pgprocnos. I.e. to only look at in-use PGXACTs we had to go
through allProcs[arrayP->pgprocnos[i]], which is, uh, suboptimal for
a performance critical routine holding a central lock.

I'll try to expand the comments around dense, but if you have a better
descriptor.


> Maybe the comments discussing how it is "dense" need to be a little
> more precise about this.
>
> + for (int i = 0; i < nxids; i++)
>
> I miss my C89. Yeah, it's just me.

Oh, dear god. I hate declaring variables like 'i' on function scope. The
bug that haunted me the longest in the development of this patch was in
XidCacheRemoveRunningXids, where there are both i and j, and a macro
XidCacheRemove(i), but the macro gets passed j as i.


> - if (!suboverflowed)
> + if (suboverflowed)
> + continue;
> +
>
> Do we really need to do this kind of diddling in this patch? I mean
> yes to the idea, but no to things that are going to make it harder to
> understand what happened if this blows up.

I can try to reduce those differences. Given the rest of the changes it
didn't seem likely to matter. I found it hard to keep the branches
nesting in my head when seeing:
                    }
                }
            }
        }
    }



> + uint32 TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts;
>
>   /* ProcGlobal */
>   size = add_size(size, sizeof(PROC_HDR));
> - /* MyProcs, including autovacuum workers and launcher */
> - size = add_size(size, mul_size(MaxBackends, sizeof(PGPROC)));
> - /* AuxiliaryProcs */
> - size = add_size(size, mul_size(NUM_AUXILIARY_PROCS, sizeof(PGPROC)));
> - /* Prepared xacts */
> - size = add_size(size, mul_size(max_prepared_xacts, sizeof(PGPROC)));
> - /* ProcStructLock */
> + size = add_size(size, mul_size(TotalProcs, sizeof(PGPROC)));
>
> This seems like a bad idea. If we establish a precedent that it's OK
> to have sizing routines that don't use add_size() and mul_size(),
> people are going to cargo cult that into places where there is more
> risk of overflow than there is here.

Hm. I'm not sure I see the problem. Are you concerned that TotalProcs
would overflow due to too big MaxBackends or max_prepared_xacts? The
multiplication itself is still protected by add_size(). It didn't seem
correct to use add_size for the TotalProcs addition, since that's not
really a size. And since the limit for procs is much lower than
UINT32_MAX...

It seems worse to add a separate add_size calculation for each type of
proc entry, for for each of the individual arrays. We'd end up with

    size = add_size(size, sizeof(PROC_HDR));
    size = add_size(size, mul_size(MaxBackends, sizeof(PGPROC)));
    size = add_size(size, mul_size(NUM_AUXILIARY_PROCS, sizeof(PGPROC)));
    size = add_size(size, mul_size(max_prepared_xacts, sizeof(PGPROC)));
    size = add_size(size, sizeof(slock_t));

    size = add_size(size, mul_size(MaxBackends, sizeof(sizeof(*ProcGlobal->xids))));
    size = add_size(size, mul_size(NUM_AUXILIARY_PROCS, sizeof(sizeof(*ProcGlobal->xids)));
    size = add_size(size, mul_size(max_prepared_xacts, sizeof(sizeof(*ProcGlobal->xids))));
    size = add_size(size, mul_size(MaxBackends, sizeof(sizeof(*ProcGlobal->subxidStates))));
    size = add_size(size, mul_size(NUM_AUXILIARY_PROCS, sizeof(sizeof(*ProcGlobal->subxidStates)));
    size = add_size(size, mul_size(max_prepared_xacts, sizeof(sizeof(*ProcGlobal->subxidStates))));
    size = add_size(size, mul_size(MaxBackends, sizeof(sizeof(*ProcGlobal->vacuumFlags))));
    size = add_size(size, mul_size(NUM_AUXILIARY_PROCS, sizeof(sizeof(*ProcGlobal->vacuumFlags)));
    size = add_size(size, mul_size(max_prepared_xacts, sizeof(sizeof(*ProcGlobal->vacuumFlags))));

instead of

    size = add_size(size, sizeof(PROC_HDR));
    size = add_size(size, mul_size(TotalProcs, sizeof(PGPROC)));
    size = add_size(size, sizeof(slock_t));

    size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->xids)));
    size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->subxidStates)));
    size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->vacuumFlags)));

which seems clearly worse.


> You've got a bunch of different places that talk about the new PGXACT
> array and they are somewhat redundant yet without saying exactly the
> same thing every time either. I think that needs cleanup.

Could you point out a few of those comments, I'm not entirely sure which
you're talking about?


> One thing I didn't see is any clear discussion of what happens if the
> two copies of the XID information don't agree with each other.

It should never happen. There's asserts that try to ensure that. For the
xid-less case:

ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
...
        Assert(!TransactionIdIsValid(proc->xidCopy));
        Assert(proc->subxidStatusCopy.count == 0);
and for the case of having an xid:

ProcArrayEndTransactionInternal(PGPROC *proc, TransactionId latestXid)
...
    Assert(ProcGlobal->xids[pgxactoff] == proc->xidCopy);
...
    Assert(ProcGlobal->subxidStates[pgxactoff].count == proc->subxidStatusCopy.count &&
           ProcGlobal->subxidStates[pgxactoff].overflowed == proc->subxidStatusCopy.overflowed);


> That should be added someplace, either in an appropriate code comment
> or in a README or something. I *think* both are protected by the same
> locks, but there's also some unlocked access to those structure
> members, so it's not entirely a slam dunk.

Hm. I considered is allowed to modify those and when to really be
covered by the existing comments in transam/README. In particular in the
"Interlocking Transaction Begin, Transaction End, and Snapshots"
section.

Do you think that a comment explaining that the *Copy version has to be
kept up2date at all times (except when not yet added with ProcArrayAdd)
would ameliorate that concern? 

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Improving connection scalability: GetSnapshotData()
Next
From: Robert Haas
Date:
Subject: Re: Improving connection scalability: GetSnapshotData()