OK, TODO updated:
* Consider compressing indexes by storing key values duplicated in several rows as a single index entry
This is difficult because it requires datatype-specific knowledge.
---------------------------------------------------------------------------
Simon Riggs wrote:
> On Thu, 2005-10-06 at 22:43 -0400, Bruce Momjian wrote:
> > Jim C. Nasby wrote:
> > > On Wed, Oct 05, 2005 at 03:40:43PM -0700, Qingqing Zhou wrote:
> > > > We do the prefix sharing when we build up index only, never on the fly.
> > >
> > > So are you saying that inserts of new data wouldn't make any use of
> > > this? ISTM that greatly reduces the usefulness, though I'm not objecting
> > > because compression during build is probably better than none at all. Is
> > > there a technical reason compression can't be used during normal
> > > operations?
> >
> > Added to TODO:
> >
> > * Consider compressing indexes by storing key prefix values shared by
> > several rows as a single index entry
>
> Just to re-iterate Tom's point. There isn't any easy way of doing this
> in an object-relational database where we can make almost no assumptions
> about particular datatypes. There is no definition of datatype prefix...
> The best you could do would be to introduce a new API that allows a
> datatype to provide a shorter prefix value when given a starting data
> value, but then you'd need to write a whole set of prefixing functions.
> But that is almost identical to the idea of functional indexes anyhow,
> so I see no value in providing a second way of doing this when the
> existing way can be more easily modified to do this.
>
> I do not think key prefixing should be on the TODO for PostgreSQL,
> unless we also agree that datatype independence should not be maintained
> in all cases and can be relaxed for built-in datatypes.
>
> I suggest we reword that to "Investigate techniques for compressing
> indexes that will work successfully with datatype independence".
>
> What we might consider is having the index store a chain of pointers as
> an array on the index row, rather than having each row map to one index
> row. That technique would considerably reduce index volume for non-
> unique indexes by removing lots of index tuple overhead. But you would
> need to keep at most N pointers on a row. This technique is used by
> Teradata's Non Unique Secondary Index design.
>
> Best Regards, Simon Riggs
>
-- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610)
359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square,
Pennsylvania19073