Re: prefix btree implementation - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: prefix btree implementation
Date
Msg-id 1128674694.8300.30.camel@localhost.localdomain
Whole thread Raw
In response to Re: prefix btree implementation  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: prefix btree implementation
List pgsql-hackers
On Thu, 2005-10-06 at 22:43 -0400, Bruce Momjian wrote:
> Jim C. Nasby wrote:
> > On Wed, Oct 05, 2005 at 03:40:43PM -0700, Qingqing Zhou wrote:
> > > We do the prefix sharing when we build up index only, never on the fly.
> > 
> > So are you saying that inserts of new data wouldn't make any use of
> > this? ISTM that greatly reduces the usefulness, though I'm not objecting
> > because compression during build is probably better than none at all. Is
> > there a technical reason compression can't be used during normal
> > operations?
> 
> Added to TODO:
> 
> * Consider compressing indexes by storing key prefix values shared by
>   several rows as a single index entry

Just to re-iterate Tom's point. There isn't any easy way of doing this
in an object-relational database where we can make almost no assumptions
about particular datatypes. There is no definition of datatype prefix...
The best you could do would be to introduce a new API that allows a
datatype to provide a shorter prefix value when given a starting data
value, but then you'd need to write a whole set of prefixing functions.
But that is almost identical to the idea of functional indexes anyhow,
so I see no value in providing a second way of doing this when the
existing way can be more easily modified to do this. 

I do not think key prefixing should be on the TODO for PostgreSQL,
unless we also agree that datatype independence should not be maintained
in all cases and can be relaxed for built-in datatypes.

I suggest we reword that to "Investigate techniques for compressing
indexes that will work successfully with datatype independence".

What we might consider is having the index store a chain of pointers as
an array on the index row, rather than having each row map to one index
row. That technique would considerably reduce index volume for non-
unique indexes by removing lots of index tuple overhead. But you would
need to keep at most N pointers on a row. This technique is used by
Teradata's Non Unique Secondary Index design.

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Shell script to extract a table from a plain text dump
Next
From: Martijn van Oosterhout
Date:
Subject: Re: [GENERAL] Shell script to extract a table from a plain text dump