Home > mailing lists

Prefix compression of B-Tree keys - Mailing list pgsql-hackers

From	Claudio Freire
Subject	Prefix compression of B-Tree keys
Date	January 31, 2014 05:26:23
Msg-id	CAGTBQpYbhy-ahV6qzq3Yqp43uS2UrukNmf-GZxHckJVrSPj8VA@mail.gmail.com Whole thread
Responses	Re: Prefix compression of B-Tree keys
List	pgsql-hackers

Tree view

On Thu, Jan 30, 2014 at 9:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> On more occasions than I care to recall, someone has suggested that it
>> would be valuable to do something with strxfrm() blobs in order to
>> have cheaper locale-aware text comparisons. One obvious place to do so
>> would be in indexes, but in the past that has been dismissed on the
>> following grounds:
>
>> * Index-only scans need fully formed datums to work, and strxfrm() is
>> a one-way function (or so I believe). There is no reason to think that
>> the original string can be reassembled from the blob, so clearly that
>> won't fly.
>
>> * There is a high cost to be paid in storage overhead. Even for
>> collations like "en_US.UTF-8", that can mean that the blob is as much
>> as 3-4 times larger than the original text string. Who is to say that
>> we'll come out ahead even with the savings of just doing a strcmp()
>> rather than a strcoll()?
>
> Quite aside from the index bloat risk, this effect means a 3-4x reduction
> in the maximum string length that can be indexed before getting the
> dreaded "Values larger than 1/3 of a buffer page cannot be indexed" error.
> Worse, a value insertion might well succeed, with the failure happening
> only (much?) later when that entry is chosen as a page split boundary.
>
> It's possible that TOAST compression of the strings would save you, but
> I'm not convinced of that; it certainly doesn't seem like we could
> guarantee no post-insertion failures that way.

Maybe not TOAST compression, but prefix compression.

I've been wondering lately, whether a format change in the B-Tree
could be worth the effort: store values with prefix compression. It
would certainly help indexes of many kinds (text-valued, multi-column,
etc...).

Now, I haven't checked if it's already done. Sorry if it is. I did
mock around btree code a lot and don't remember any of this, but I do
remember stuff that could be used to achieve it (specifically, all the
index-only-scan machinery, which allows for implicit info).

Granted, this isn't strxfrm, but it may provide some of the benefits
(faster comparisons because less bytes are compared?).

On Fri, Jan 31, 2014 at 1:51 AM, Peter Geoghegan <pg@heroku.com> wrote:
> We're already suffering from the fact
> that strcoll() mandates a NULL-terminated string, since we have to
> copy text datums to a buffer to deal with the impedance mismatch.
> Furthermore, multi-column comparisons probably have a lot of overhead
> at present from all the indirection alone.

Which makes the above even more likely to help.

pgsql-hackers by date:

From: Peter Geoghegan
Date: 31 January 2014, 04:51:23
Subject: Re: Making strxfrm() blobs in indexes work

From: Fujii Masao
Date: 31 January 2014, 05:26:48
Subject: Re: Backup throttling

Prefix compression of B-Tree keys - Mailing list pgsql-hackers

Previous

Next