Home > mailing lists

Re: Enabling B-Tree deduplication by default - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Enabling B-Tree deduplication by default
Date	January 29, 2020 04:36:39
Msg-id	CAH2-WzmByJciGE7ZNvL5=c+bU1y44+Aho_a_YrEcz3bnGeU4qQ@mail.gmail.com Whole thread Raw
In response to	Re: Enabling B-Tree deduplication by default (Peter Geoghegan <pg@bowt.ie>)
List	pgsql-hackers

Tree view

On Thu, Jan 16, 2020 at 12:05 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > It does seem odd to me to treat them differently, but it's possible
> > that this is a reflection of my own lack of understanding. What do
> > other database systems do?
>
> Other database systems treat unique indexes very differently, albeit
> in a way that we're not really in a position to take too much away
> from -- other than the general fact that unique indexes can be thought
> of as very different things.

I should point out here that I've just posted v31 of the patch, which
changes things for unique indexes. Our strategy during deduplication
is now the same for unique indexes, since the original,
super-incremental approach doesn't seem to make sense anymore. Further
optimization work in the patch eliminated problems that made this
approach seem like it might be worthwhile.

Note, however, that v31 changes nothing about how we think about
deduplication in unique indexes in general, nor how it is presented to
users. There is still special criteria around how deduplication is
*triggered* in unique indexes. We continue to trigger a deduplication
pass based on seeing a duplicate within _bt_check_unique() +
_bt_findinsertloc() -- otherwise we never attempt deduplication in a
unique index (same as before). Plus the GUC still doesn't affect
unique indexes, unique index deduplication still isn't really
documented in the user docs (it just gets a passing mention in B-Tree
internals section), etc. This seems like the right way to go, since
deduplication in unique indexes can only make sense on leaf pages
where most or all new items are duplicates of existing items, a
situation that is already easy to detect.

It wouldn't be that bad if we always attempted deduplication in a
unique index, but it's easy to only do it when we're pretty confident
we'll get a benefit -- why not save a few cycles?

--
Peter Geoghegan

pgsql-hackers by date:

From: Peter Geoghegan
Date: 29 January 2020, 04:29:05
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Kohei KaiGai
Date: 29 January 2020, 04:49:32
Subject: Re: Is custom MemoryContext prohibited?

Re: Enabling B-Tree deduplication by default - Mailing list pgsql-hackers

Previous

Next