Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date
Msg-id CAH2-Wz=zJa7Biitwz4K=C7UmaYdLWXarQ4uTwwAKSatEnd9TGQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.  (Mark Dilger <hornschnorter@gmail.com>)
List pgsql-hackers
On Wed, Nov 13, 2019 at 11:33 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Nov 12, 2019 at 6:22 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > * Disabled deduplication in system catalog indexes by deeming it
> > generally unsafe.
>
> I (continue to) think that deduplication is a terrible name, because
> you're not getting rid of the duplicates. You are using a compressed
> representation of the duplicates.

"Deduplication" never means that you get rid of duplicates. According
to Wikipedia's deduplication article: "Whereas compression algorithms
identify redundant data inside individual files and encodes this
redundant data more efficiently, the intent of deduplication is to
inspect large volumes of data and identify large sections – such as
entire files or large sections of files – that are identical, and
replace them with a shared copy".

This seemed like it fit what this patch does. We're concerned with a
specific, simple kind of redundancy. Also:

* From the user's point of view, we're merging together what they'd
call duplicates. They don't really think of the heap TID as part of
the key.

* The term "compression" suggests a decompression penalty when
reading, which is not the case here.

* The term "compression" confuses the feature added by the patch with
TOAST compression. Now we may have two very different varieties of
compression in the same index.

Can you suggest an alternative?

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: ssl passphrase callback
Next
From: Tom Lane
Date:
Subject: Re: [PATCH] gcc warning 'expression which evaluates to zero treated as a null pointer'