Re: Index AM change proposals, redux - Mailing list pgsql-hackers

From Decibel!
Subject Re: Index AM change proposals, redux
Date
Msg-id 39CFF9DB-E0B5-4B69-B975-94FA120D5EEA@decibel.org
Whole thread Raw
In response to Re: Index AM change proposals, redux  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Apr 24, 2008, at 10:43 AM, Bruce Momjian wrote:

Bruce asked if these should be TODOs...

>> Index compression is possible in many ways, depending upon the
>> situation. All of the following sound similar at a high level, but  
>> each
>> covers a different use case.
>>
>> * For Long, Similar data e.g. Text we can use Prefix Compression
>> We still store one pointer per row, but we reduce the size of the  
>> index
>> by reducing the size of the key values. This requires us to reach  
>> inside
>> datatypes, so isn't a very general solution but is probably an  
>> important
>> one in the future for Text.

I think what would be even more useful is doing this within the table  
itself, and then bubbling that up to the index.

>> * For Unique/nearly-Unique indexes we can use Range Compression
>> We reduce the size of the index by holding one index pointer per  
>> range
>> of values, thus removing both keys and pointers. It's more efficient
>> than prefix compression and isn't datatype-dependant.

Definitely.

>> * For Highly Non-Unique Data we can use Duplicate Compression
>> The latter is the technique used by Bitmap Indexes. Efficient, but  
>> not
>> useful for unique/nearly-unique data

Also definitely. This would be hugely useful for things like "status"  
or "type" fields.

>> * Multi-Column Leading Value Compression - if you have a multi-column
>> index, then leading columns are usually duplicated between rows  
>> inserted
>> at the same time. Using an on-block dictionary we can remove  
>> duplicates.
>> Only useful for multi-column indexes, possibly overlapping/contained
>> subset of the GIT use case.


Also useful, though I generally try and put the most diverse values  
first in indexes to increase the odds of them being used. Perhaps if  
we had compression this would change.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Is this TODO item done?
Next
From: Tom Lane
Date:
Subject: Re: [GENERAL] I think this is a BUG?