Split interval (used by nbtree suffix truncation) and posting list tuples - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Split interval (used by nbtree suffix truncation) and posting list tuples
Date
Msg-id CAH2-WznJt5aT2uUB2Bs+JBLdwe0XTX67+xeLFcaNvCKxO=QBVQ@mail.gmail.com
Whole thread Raw
List pgsql-hackers
We choose a split point in nbtsplitloc.c primarily based on evenly
dividing space among left and right halves of the split, while giving
secondary consideration to suffix truncation (there is other logic
that kicks in when there are many duplicates, which isn't related to
what I want to talk about). See my commit fab25024 from Postgres 12
for background information.

The larger the split interval, the more unbalanced page splits are
allowed to be, which is a cost that we may be willing to pay to
truncate more suffix attributes in the new high key. Split interval
represents a trade-off between two competing considerations (a
potential cost versus a potential benefit). Unfortunately, commit
fab25024 defined "split interval" based on logic that makes the
assumption that tuples on the same page are more or less of a uniform
size. That assumption was questionable when suffix truncation went in,
but now that deduplication exists the assumption seems quite risky. I
am concerned that suffix truncation will be far more aggressive than
appropriate when the delta-optimal split point is near large posting
list tuples, that are size outliers on the page. The logic in
nbtsplitloc.c might accept a much more unbalanced page split than is
truly reasonable because the average width of tuples on the page isn't
so large, even though the tuples around where we want to split the
page are very large.

Fortunately it's pretty easy to nail this down. We can determine the
default strategy split interval based on the cost that we actually
care about (not a fuzzy proxy of that cost): a leftfree and rightfree
space tolerance from the space optimal candidate split point,
expressed in bytes (actually, expressed as a proportion of the total
space that is used for data items on the page, which is converted into
bytes to form a tolerance). That's the way it's done in the attached
patch.

I plan to commit this patch next week, barring any objections.

-- 
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Build errors in VS
Next
From: Robert Haas
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?