On Apr 24, 2008, at 10:43 AM, Bruce Momjian wrote:
Bruce asked if these should be TODOs...
>> Index compression is possible in many ways, depending upon the
>> situation. All of the following sound similar at a high level, but
>> each
>> covers a different use case.
>>
>> * For Long, Similar data e.g. Text we can use Prefix Compression
>> We still store one pointer per row, but we reduce the size of the
>> index
>> by reducing the size of the key values. This requires us to reach
>> inside
>> datatypes, so isn't a very general solution but is probably an
>> important
>> one in the future for Text.
I think what would be even more useful is doing this within the table
itself, and then bubbling that up to the index.
>> * For Unique/nearly-Unique indexes we can use Range Compression
>> We reduce the size of the index by holding one index pointer per
>> range
>> of values, thus removing both keys and pointers. It's more efficient
>> than prefix compression and isn't datatype-dependant.
Definitely.
>> * For Highly Non-Unique Data we can use Duplicate Compression
>> The latter is the technique used by Bitmap Indexes. Efficient, but
>> not
>> useful for unique/nearly-unique data
Also definitely. This would be hugely useful for things like "status"
or "type" fields.
>> * Multi-Column Leading Value Compression - if you have a multi-column
>> index, then leading columns are usually duplicated between rows
>> inserted
>> at the same time. Using an on-block dictionary we can remove
>> duplicates.
>> Only useful for multi-column indexes, possibly overlapping/contained
>> subset of the GIT use case.
Also useful, though I generally try and put the most diverse values
first in indexes to increase the odds of them being used. Perhaps if
we had compression this would change.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828