Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM? - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?
Date
Msg-id CAEze2Wj_9kvgdYUJazbxyTuVLF_2ibZun9H1xULV5bUBXJUNdg@mail.gmail.com
Whole thread Raw
In response to Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
On Thu, 24 Apr 2025 at 22:46, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Fri, 21 Mar 2025 at 17:14, Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > Attached is v10, which polishes the previous patches, and adds a patch
> > for nbtree to use the new visibility checking strategy so that it too
> > can release its index pages much earlier, and adds a similar
> > visibility check test to nbtree.
>
> And here's v12. v11 (skipped) would've been a rebase, but after
> finishing the rebase I noticed a severe regression in btree's IOS with
> the new code, so v12 here applies some optimizations which reduce the
> overhead of the new code.

Here's v13, which moves the changes around a bit:

v12's 0001 is split into 3 patches (1, 3, and 4), whilst v12's 2-5
were correspondingly renumbered 5-8. Patch 0002 is an otherwise
unrelated change in pg_visibility that updates it to use the new
vectorized API, reducing overhead. So, summary of the patches:

0001: Replaces visibilitymap_get_status with a vectorized variant that
touches each VM page at most once per call; reducing buffer churn and
enabling later patches
0002: update pg_visibility to use this newly vectorized API (instead
of the current model that checks each page at a time)
0003: Adds the table_index_vischeck_tuples API, requiring a table AM
to expose VM checks through an API.
0004: Adjusts Index-Only scan infrastructure to make use of 0004,
instead of using VM_ALL_VISIBLE(). It also adds the relevant
infrastructure for enabling index AMs to provide the pre-checked
visibility status (from table_index_vischeck_tuples) to an index-only
scan.
0005: Implement VM-checks in GIST's IOS code
0006: Same, but for SP-GIST
0007: Same, but for NBTREE
0008: Add tests which validate that we still get correct results from
our queries, even when we use cursors to block results from getting
returned, and cleaning up tuples from those index pages.

A big benefit with this patch is that indexes now have no direct
reason anymore to hold back a VACUUM scan -- the visibility of tuples
can be checked at page scan time, and any shared resources can be
released before returning tuples to higher nodes.

------------------------

Summary of the problem that I'm solving here:

An index that holds dead tuples could return those dead tuples in an
Index-Only Scan (IOS) to the scan node, as the index AM itself doesn't
have any information about the visibility of the tuples that it
contains. The IOS infrastructure prevents these tuples from being
exposed by doing visibility checks against the Visibility Map (VM) and
-if necessary- the underlying heap.

This, however, depends on an invariant: VACUUM MUST NOT remove a TID
that's being returned by an index scan, at least not before before
that tuple has been checked for visibility in the VM; otherwise VACUUM
may get to clean up the dead tuple's page and mark it all-visible
before the visibility check occurs, incorrectly returning an
'all-visible' result for that dead tuple.

Btree indexes interlock with vacuum using a buffer cleanup lock and a
pin on pages it's yet to return results for; and that solution works
quite fine [^1].
This same solution sadly doesn't work for GiST and SP-GiST, as in a
worst case scenario they may have to sort the whole index in memory
before they can return the first index tuple, and holding pins on
those pages would be extremely punishing and might even cause system
crashes due to a lack of available un-pinned shared buffers.
To solve this, we implement a mechanism to allow indexes to do rough
visibility checks on TIDs; the results of which can then be passed
through the IndexScanDesc to indicate what the VM state was when the
tuple was still in the index. This enables them to make the VM check
happen just after they've scanned a page, but before they release
their pin on the page, adding the interlock with VACUUM without
requiring unreasonably large amounts of page pins.

This new mechanism is safe in MVCC snapshots, where we know that
tuples which are all-visible can't be removed while the scan is
ongoing, and where possibly-dead possibly-replaced TIDs are known to
be visibility-checked using transaction IDs, and where any new TID
would have to be inserted in a different transaction and therefore is
definitely invisible to our current transaction.

------------------------

Open items: review that this doesn't have any further issues, and
commits once this has been considered good enough.

Note:
  This patch changes TableAMRoutine and renames/changes exposed
functions, and as a result can't be backpatched as-is. I have a
separate thread over at [0] where I'm keeping track of a patchset that
is derived from this one and is focused on backpatching. That patchset
will contain patches 0004/0005/0006 and a reduced version of 0001+0003
to make it work in older branches without breaking external ABI
compatibility. I intend for the exposed table_index_vischeck_tuples()
API to remain consistent across the two patchsets.


Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)

[0]: https://www.postgresql.org/message-id/flat/CAEze2WgH13m=MDST58KLo-NkZpbwBEt4xNWcgtghWBwRj3J0+A@mail.gmail.com

[^1]: Mostly fine, because it still holds VACUUM back when an
index-only scan holds a page pin and VACUUM needs to process that
page. If the index scan doesn't progress, then VACUUM can't progress
either, and that can cause vacuum to get stuck. That issue is solved
(for normal index scans) with patch 0007.

Attachment

pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: Vectorize pg_visibility.pg_visibility_map_summary
Next
From: Masahiko Sawada
Date:
Subject: Re: psql: tab-completion support for COPY ... TO/FROM STDIN, STDOUT, and PROGRAM