Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Improving connection scalability: GetSnapshotData()
Date
Msg-id 20200407121503.zltbpqmdesurflnm@alap3.anarazel.de
Whole thread Raw
In response to Re: Improving connection scalability: GetSnapshotData()  (Andres Freund <andres@anarazel.de>)
Responses Re: Improving connection scalability: GetSnapshotData()  (Andres Freund <andres@anarazel.de>)
Re: Improving connection scalability: GetSnapshotData()  ("Jonathan S. Katz" <jkatz@postgresql.org>)
Re: Improving connection scalability: GetSnapshotData()  (Robert Haas <robertmhaas@gmail.com>)
Re: Improving connection scalability: GetSnapshotData()  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

SEE BELOW: What, and what not, to do for v13.


Attached is a substantially polished version of my patches. Note that
the first three patches, as well as the last, are not intended to be
committed at this time / in this form - they're there to make testing
easier.

There is a lot of polish, but also a few substantial changes:

- To be compatible with old_snapshot_threshold I've revised the way
  heap_page_prune_opt() deals with old_snapshot_threshold. Now
  old_snapshot_threshold is only applied when we otherwise would have
  been unable to prune (both at the time of the pd_prune_xid check, and
  on individual tuples). This makes old_snapshot_threshold considerably
  cheaper and cause less conflicts.

  This required adding a version of HeapTupleSatisfiesVacuum that
  returns the horizon, rather than doing the horizon test itself; that
  way we can first test a tuple's horizon against the normal approximate
  threshold (making it an accurate threshold if needed) and only if that
  fails fall back to old_snapshot_threshold.

  The main reason here was not to improve old_snapshot_threshold, but to
  avoid a regression when its being used. Because we need a horizon to
  pass to old_snapshot_threshold, we'd have to fall back to computing an
  accurate horizon too often.


- Previous versions of the patch had a TODO about computing horizons not
  just for one of shared / catalog / data tables, but all of them at
  once. To avoid potentially needing to determine xmin horizons multiple
  times within one transaction.  For that I've renamed GetOldestXmin()
  to ComputeTransactionHorizons() and added wrapper functions instead of
  the different flag combinations we previously had for GetOldestXmin().

  This allows us to get rid of the PROCARRAY_* flags, and PROC_RESERVED.


- To address Thomas' review comment about not accessing nextFullXid
  without xidGenLock, I made latestCompletedXid a FullTransactionId (a
  fxid is needed to be able to infer 64bit xids for the horizons -
  otherwise there is some danger they could wrap).


- Improving the comment around the snapshot caching, I decided that the
  justification for correctness around not taking ProcArrayLock is too
  complicated (in particular around setting MyProc->xmin). While
  avoiding ProcArrayLock alltogether is a substantial gain, the caching
  itself helps a lot already. Seems best to leave that for a later step.

  This means that the numbers for the very high connection counts aren't
  quite as good.


- Plenty of small changes to address issues I found while
  benchmarking. The only one of real note is that I had released
  XidGenLock after ProcArrayLock in ProcArrayAdd/Remove. For 2pc that
  causes noticable unnecessary contention, because we'll wait for
  XidGenLock while holding ProcArrayLock...


I think this is pretty close to being committable.


But: This patch came in very late for v13, and it took me much longer to
polish it up than I had hoped (partially distraction due to various bugs
I found (in particular snapshot_too_old), partially covid19, partially
"hell if I know"). The patchset touches core parts of the system. While
both Thomas and David have done some review, they haven't for the latest
version (mea culpa).

In many other instances I would say that the above suggests slipping to
v14, given the timing.

The main reason I am considering pushing is that I think this patcheset
addresses one of the most common critiques of postgres, as well as very
common, hard to fix, real-world production issues. GetSnapshotData() has
been a major bottleneck for about as long as I have been using postgres,
and this addresses that to a significant degree.

A second reason I am considering it is that, in my opinion, the changes
are not all that complicated and not even that large. At least not for a
change to a problem that we've long tried to improve.


Obviously we all have a tendency to think our own work is important, and
that we deserve a bit more leeway than others. So take the above with a
grain of salt.


Comments?


Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Make MemoryContextMemAllocated() more precise
Next
From: Alexander Korotkov
Date:
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed