pgsql: Don't overlook indexes during parallel VACUUM. - Mailing list pgsql-committers

From Peter Geoghegan
Subject pgsql: Don't overlook indexes during parallel VACUUM.
Date
Msg-id E1mhz8G-0001MD-9S@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Don't overlook indexes during parallel VACUUM.

Commit b4af70cb, which simplified state managed by VACUUM, performed
refactoring of parallel VACUUM in passing.  Confusion about the exact
details of the tasks that the leader process is responsible for led to
code that made it possible for parallel VACUUM to miss a subset of the
table's indexes entirely.  Specifically, indexes that fell under the
min_parallel_index_scan_size size cutoff were missed.  These indexes are
supposed to be vacuumed by the leader (alongside any parallel unsafe
indexes), but weren't vacuumed at all.  Affected indexes could easily
end up with duplicate heap TIDs, once heap TIDs were recycled for new
heap tuples.  This had generic symptoms that might be seen with almost
any index corruption involving structural inconsistencies between an
index and its table.

To fix, make sure that the parallel VACUUM leader process performs any
required index vacuuming for indexes that happen to be below the size
cutoff.  Also document the design of parallel VACUUM with these
below-size-cutoff indexes.

It's unclear how many users might be affected by this bug.  There had to
be at least three indexes on the table to hit the bug: a smaller index,
plus at least two additional indexes that themselves exceed the size
cutoff.  Cases with just one additional index would not run into
trouble, since the parallel VACUUM cost model requires two
larger-than-cutoff indexes on the table to apply any parallel
processing.  Note also that autovacuum was not affected, since it never
uses parallel processing.

Test case based on tests from a larger patch to test parallel VACUUM by
Masahiko Sawada.

Many thanks to Kamigishi Rei for her invaluable help with tracking this
problem down.

Author: Peter Geoghegan <pg@bowt.ie>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-By: Kamigishi Rei <iijima.yun@koumakan.jp>
Reported-By: Andrew Gierth <andrew@tao11.riddles.org.uk>
Diagnosed-By: Andres Freund <andres@anarazel.de>
Bug: #17245
Discussion: https://postgr.es/m/17245-ddf06aaf85735f36@postgresql.org
Discussion: https://postgr.es/m/20211030023740.qbnsl2xaoh2grq3d@alap3.anarazel.de
Backpatch: 14-, where the refactoring commit appears.

Branch
------
REL_14_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/61a86ed55ba169044b9a692542bad1b05341147b

Modified Files
--------------
src/backend/access/heap/vacuumlazy.c          | 60 ++++++++++++++++-----------
src/include/commands/vacuum.h                 |  2 +-
src/test/regress/expected/vacuum_parallel.out | 49 ++++++++++++++++++++++
src/test/regress/parallel_schedule            |  1 +
src/test/regress/sql/vacuum_parallel.sql      | 46 ++++++++++++++++++++
5 files changed, 132 insertions(+), 26 deletions(-)


pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: pgsql: Ensure consistent logical replication of datetime and float8 val
Next
From: Tom Lane
Date:
Subject: pgsql: Blind attempt to silence SSL compile failures on hamerkop.