Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Deleting older versions in unique indexes to avoid page splits
Date
Msg-id CAH2-WzmP5AymEfT_n3wAdvW8D7DduapHPqRzds5kv7VjnXsx6Q@mail.gmail.com
Whole thread Raw
In response to Re: Deleting older versions in unique indexes to avoid page splits  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Deleting older versions in unique indexes to avoid page splits  (Victor Yegorov <vyegorov@gmail.com>)
Re: Deleting older versions in unique indexes to avoid page splits  (Victor Yegorov <vyegorov@gmail.com>)
List pgsql-hackers
On Tue, Nov 3, 2020 at 12:44 PM Peter Geoghegan <pg@bowt.ie> wrote:
> v6 still needs more polishing -- my focus has still been on the
> algorithm itself. But I think I'm almost done with that part -- it
> seems unlikely that I'll be able to make any additional significant
> improvements in that area after v6.

Attached is v7, which tidies everything up. The project is now broken
up into multiple patches, which can be committed separately. Every
patch has a descriptive commit message. This should make it a lot
easier to review.

I've renamed the feature to "bottom-up index deletion" in this latest
revision. This seems like a better name than "dedup deletion". This
name suggests that the feature complements "top-down index deletion"
by VACUUM. This name is descriptive of what the new mechanism is
supposed to do at a high level.

Other changes in v7 include:

* We now fully use the tableam interface -- see the first patch.

The bottom-up index deletion API has been fully worked out. There is
now an optional callback/shim function. The bottom-up index deletion
caller (nbtree) is decoupled from the callee (heapam) by the tableam
shim. This was what allowed me to break the patch into multiple
pieces/patches.

* The executor no longer uses a IndexUniqueCheck-enum-constant as a
hint to nbtree. Rather, we have a new aminsert() bool argument/flag
that hints to the index AM -- see the second patch.

To recap, the hint tells nbtree that the incoming tuple is a duplicate
of an existing tuple caused by an UPDATE, without any logical changes
for the indexed columns. Bottom-up deletion is effective when there is
a local concentration of these index tuples that become garbage
quickly.

A dedicated aminsert() argument seems a lot cleaner. Though I wonder
if this approach can be generalized a bit further, so that we can
support other similar aminsert() hints in the future without adding
even more arguments. Maybe some new enum instead of a boolean?

* Code cleanup for the nbtdedup.c changes. Better explanation of when
and how posting list TIDs are marked promising, and why.

* Streamlined handling of the various strategies that nbtinsert.c uses
to avoid a page split (e.g. traditional LP_DEAD deletion,
deduplication).

A new unified function in nbtinsert.c was added. This organization is
a lot cleaner -- it greatly simplifies _bt_findinsertloc(), which
became more complicated than it really needed to be due to the changes
needed for deduplication in PostgreSQL 13. This change almost seems
like an independently useful piece of work.

--
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Disable WAL logging to speed up data loading
Next
From: Stephen Frost
Date:
Subject: Re: Disable WAL logging to speed up data loading