Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date
Msg-id CAH2-WzkWLRDzCaxsGvA_pZoaix_2AC9S6=-D6JMLkQYhqrJuEg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Tue, Dec 17, 2019 at 5:18 PM Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, Dec 17, 2019 at 03:30:33PM -0800, Peter Geoghegan wrote:
> > With many real world unique indexes, the true reason behind most or
> > all B-Tree page splits is "version churn". I view these page splits as
> > a permanent solution to a temporary problem -- we *permanently*
> > degrade the index structure in order to deal with a *temporary* burst
> > in versions that need to be stored. That's really bad.
>
> Yes, I was thinking why do we need to optimize duplicates in a unique
> index but then remembered is a version problem.

The whole idea of deduplication in unique indexes is hard to explain.
It just sounds odd. Also, it works using the same infrastructure as
regular deduplication, while having rather different goals.
Fortunately, it seems like we don't really have to tell users about it
in order for them to see a benefit -- there will be no choice for them
to make there (they just get it).

The regular deduplication stuff isn't confusing at all, though. It has
some noticeable though small downside, so it will be documented and
configurable. (I'm optimistic that it can be enabled by default,
because even with high cardinality non-unique indexes the downside is
rather small -- we waste some CPU cycles just before a page is split.)

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [Proposal] Level4 Warnings show many shadow vars
Next
From: Bruce Momjian
Date:
Subject: Re: [PATCH] Tiny optimization.