Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
Date
Msg-id CAH2-Wzm5M+Djb8dJnGdiP8dwN7fZDNR-PSwNekGvKxa9-a_9cQ@mail.gmail.com
Whole thread Raw
In response to Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
On Tue, Aug 6, 2024 at 5:42 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> Attached is version 16 now.

I ran this with my old land registry benchmark, used for validating
the space utilization impact of nbtree deduplication (among other
things). This isn't obviously the best benchmark for this sort of
thing, but I seem to recall that you'd used it yourself at some point.
To validate work in this area, likely including this patch. So I
decided to start there.

To be clear, this test involves bulk loading of an unlogged table (the
land registry table). The following composite index is created on the
table before we insert any rows, so most of the cycles here are in
index maintenance including _bt_search descents:

CREATE INDEX composite ON land2 USING btree (county COLLATE "C", city
COLLATE "C", locality COLLATE "C");

I wasn't able to see much of an improvement with this patch applied.
It went from  ~00:51.598 to ~00:51.053. That's a little disappointing,
given that this is supposed to be a sympathetic case for the patch.
Can you suggest something else? (Granted, I understand that this patch
has some complicated relationship with other patches of yours, which I
don't understand currently.)

I'm a bit worried about side-effects for this assertion:

@@ -485,7 +489,7 @@ _bt_check_unique(Relation rel, BTInsertState
insertstate, Relation heapRel,
                Assert(insertstate->bounds_valid);
                Assert(insertstate->low >= P_FIRSTDATAKEY(opaque));
                Assert(insertstate->low <= insertstate->stricthigh);
-               Assert(_bt_compare(rel, itup_key, page, offset) < 0);
+               Assert(_bt_compare(rel, itup_key, page, offset, &sprefix) < 0);
                break;
            }

More generally, it's not entirely clear how the code in
_bt_check_unique is supposed to work with the patch. Why should it be
safe to do what you're doing with the prefix there? It's not like
we're doing a binary search here -- it's more like a linear search.

> - Move from 1-indexed AttributeNumber to 0-indexed ints for prefixes,
> and use "prefix" as naming scheme, rather than cmpcol. A lack of
> prefix, previously indicated with a cmpcol value of 1, is now a prefix
> value of 0.

Found a likely-related bug in the changes you made to amcheck, which I
was able to fix myself like so:

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index c7dc6725a..15be61777 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -3187,7 +3187,7 @@ bt_rootdescend(BtreeCheckState *state, IndexTuple itup)
         insertstate.buf = lbuf;

         /* Get matching tuple on leaf page */
-        offnum = _bt_binsrch_insert(state->rel, &insertstate, 1);
+        offnum = _bt_binsrch_insert(state->rel, &insertstate, 0);
         /* Compare first >= matching item on leaf page, if any */

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: Cross-version Compatibility of postgres_fdw
Next
From: Heikki Linnakangas
Date:
Subject: Re: collect_corrupt_items_vacuum.patch