Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Improving connection scalability: GetSnapshotData()
Date
Msg-id CA+TgmoZYnE6cAjgvzkXdYNu4EPjJm01eQdL=v0mweH132F2_jQ@mail.gmail.com
Whole thread Raw
In response to Re: Improving connection scalability: GetSnapshotData()  (Andres Freund <andres@anarazel.de>)
Responses Re: Improving connection scalability: GetSnapshotData()  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Comments:

In 0002, the comments in SnapshotSet() are virtually incomprehensible.
There's no commit message so the reasons for the changes are unclear.
But mostly looks unproblematic.

0003 looks like a fairly unrelated bug fix that deserves to be
discussed on the thread related to the original patch. Probably should
be an open item.

0004 looks fine.

Regarding 0005:

There's sort of a mix of terminology here: are we pruning tuples or
removing tuples or testing whether things are invisible? It would be
better to be more consistent.

+ * State for testing whether tuple versions may be removed. To improve
+ * GetSnapshotData() performance we don't compute an accurate value whenever
+ * acquiring a snapshot. Instead we compute boundaries above/below which we
+ * know that row versions are [not] needed anymore.  If at test time values
+ * falls in between the two, the boundaries can be recomputed (unless that
+ * just happened).

I don't like the wording here much. Maybe: State for testing whether
an XID is invisible to all current snapshots. If an XID precedes
maybe_needed_bound, it's definitely not visible to any current
snapshot. If it equals or follows definitely_needed_bound, that XID
isn't necessarily invisible to all snapshots. If it falls in between,
we're not sure. If, when testing a particular tuple, we see an XID
somewhere in the middle, we can try recomputing the boundaries to get
a more accurate answer (unless we've just done that). This is cheaper
than maintaining an accurate value all the time.

There's also the problem that this sorta contradicts the comment for
definitely_needed_bound. There it says intermediate values needed to
be tested against the ProcArray, whereas here it says we need to
recompute the bounds. That's kinda confusing.

ComputedHorizons seems like a fairly generic name. I think there's
some relationship between InvisibleToEveryoneState and
ComputedHorizons that should be brought out more clearly by the naming
and the comments.

+ /*
+ * The value of ShmemVariableCache->latestCompletedFullXid when
+ * ComputeTransactionHorizons() held ProcArrayLock.
+ */
+ FullTransactionId latest_completed;
+
+ /*
+ * The same for procArray->replication_slot_xmin and.
+ * procArray->replication_slot_catalog_xmin.
+ */
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;

Department of randomly inconsistent names. In general I think it's
quite hard to grasp the relationship between the different fields in
ComputedHorizons.

+static inline bool OldSnapshotThresholdActive(void)
+{
+ return old_snapshot_threshold >= 0;
+}

Formatting.

+
+bool
+GinPageIsRecyclable(Page page)

Needs a comment. Or more than one.

- /*
- * If a transaction wrote a commit record in the gap between taking and
- * logging the snapshot then latestCompletedXid may already be higher than
- * the value from the snapshot, so check before we use the incoming value.
- */
- if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-   running->latestCompletedXid))
- ShmemVariableCache->latestCompletedXid = running->latestCompletedXid;
-
- Assert(TransactionIdIsNormal(ShmemVariableCache->latestCompletedXid));
-
- LWLockRelease(ProcArrayLock);

This code got relocated so that the lock is released later, but you
didn't add any comments explaining why. Somebody will move it back and
then you'll yet at them for doing it wrong. :-)

+ /*
+ * Must have called GetOldestVisibleTransactionId() if using SnapshotAny.
+ * Shouldn't have for an MVCC snapshot. (It's especially worth checking
+ * this for parallel builds, since ambuild routines that support parallel
+ * builds must work these details out for themselves.)
+ */
+ Assert(snapshot == SnapshotAny || IsMVCCSnapshot(snapshot));
+ Assert(snapshot == SnapshotAny ? TransactionIdIsValid(OldestXmin) :
+    !TransactionIdIsValid(OldestXmin));
+ Assert(snapshot == SnapshotAny || !anyvisible);

This looks like a gratuitous code relocation.

+HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after)

I don't much like the name dead_after, but I don't have a better
suggestion, either.

- * Deleter committed, but perhaps it was recent enough that some open
- * transactions could still see the tuple.
+ * Deleter committed, allow caller to check if it was recent enough that
+ * some open transactions could still see the tuple.

I think you could drop this change.

+ /*
+ * State related to determining whether a dead tuple is still needed.
+ */
+ InvisibleToEveryoneState *vistest;
+ TimestampTz limited_oldest_ts;
+ TransactionId limited_oldest_xmin;
+ /* have we made removal decision based on old_snapshot_threshold */
+ bool limited_oldest_committed;

Would benefit from more comments.

+ * accuring to prstate->vistest, but that can be removed based on

Typo.

Generally, heap_prune_satisfies_vacuum looks pretty good. The
limited_oldest_committed naming is confusing, but the comments make it
a lot clearer.

+ * If oldest btpo.xact in the deleted pages is invisible, then at

I'd say "invisible to everyone" here for clarity.

-latestCompletedXid variable.  This allows GetSnapshotData to use
-latestCompletedXid + 1 as xmax for its snapshot: there can be no
+latestCompletedFullXid variable.  This allows GetSnapshotData to use
+latestCompletedFullXid + 1 as xmax for its snapshot: there can be no

Is this fixing a preexisting README defect?

It might be useful if this README expanded on the new machinery a bit
instead of just updating the wording to account for it, but I'm not
sure exactly what that would look like or whether it would be too
duplicative of other things.

+void AssertTransactionIdMayBeOnDisk(TransactionId xid)

Formatting.

+ * Assert that xid is one that we could actually see on disk.

I don't know what this means. The whole purpose of this routine is
very unclear to me.

  * the secondary effect that it sets RecentGlobalXmin.  (This is critical
  * for anything that reads heap pages, because HOT may decide to prune
  * them even if the process doesn't attempt to modify any tuples.)
+ *
+ * FIXME: This comment is inaccurate / the code buggy. A snapshot that is
+ * not pushed/active does not reliably prevent HOT pruning (->xmin could
+ * e.g. be cleared when cache invalidations are processed).

Something needs to be done here... and in the other similar case.

Is this kind of review helpful?

...Robert



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: WIP: WAL prefetch (another approach)
Next
From: Asif Rehman
Date:
Subject: Re: WIP/PoC for parallel backup