Home > mailing lists

Re: prefix btree implementation - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: prefix btree implementation
Date	October 7, 2005 11:04:22
Msg-id	200510071404.j97E4Fu20249@candle.pha.pa.us Whole thread Raw
In response to	Re: prefix btree implementation (Simon Riggs <simon@2ndquadrant.com>)
List	pgsql-hackers

Tree view

OK, TODO updated:

* Consider compressing indexes by storing key values duplicated in several rows as a single index entry
 This is difficult because it requires datatype-specific knowledge.


---------------------------------------------------------------------------

Simon Riggs wrote:
> On Thu, 2005-10-06 at 22:43 -0400, Bruce Momjian wrote:
> > Jim C. Nasby wrote:
> > > On Wed, Oct 05, 2005 at 03:40:43PM -0700, Qingqing Zhou wrote:
> > > > We do the prefix sharing when we build up index only, never on the fly.
> > > 
> > > So are you saying that inserts of new data wouldn't make any use of
> > > this? ISTM that greatly reduces the usefulness, though I'm not objecting
> > > because compression during build is probably better than none at all. Is
> > > there a technical reason compression can't be used during normal
> > > operations?
> > 
> > Added to TODO:
> > 
> > * Consider compressing indexes by storing key prefix values shared by
> >   several rows as a single index entry
> 
> Just to re-iterate Tom's point. There isn't any easy way of doing this
> in an object-relational database where we can make almost no assumptions
> about particular datatypes. There is no definition of datatype prefix...
> The best you could do would be to introduce a new API that allows a
> datatype to provide a shorter prefix value when given a starting data
> value, but then you'd need to write a whole set of prefixing functions.
> But that is almost identical to the idea of functional indexes anyhow,
> so I see no value in providing a second way of doing this when the
> existing way can be more easily modified to do this. 
> 
> I do not think key prefixing should be on the TODO for PostgreSQL,
> unless we also agree that datatype independence should not be maintained
> in all cases and can be relaxed for built-in datatypes.
> 
> I suggest we reword that to "Investigate techniques for compressing
> indexes that will work successfully with datatype independence".
> 
> What we might consider is having the index store a chain of pointers as
> an array on the index row, rather than having each row map to one index
> row. That technique would considerably reduce index volume for non-
> unique indexes by removing lots of index tuple overhead. But you would
> need to keep at most N pointers on a row. This technique is used by
> Teradata's Non Unique Secondary Index design.
> 
> Best Regards, Simon Riggs
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

pgsql-hackers by date:

From: mark@mark.mielke.cc
Date: 07 October 2005, 11:01:53
Subject: Re: Vote needed: revert beta2 changes or not?

From: Emil Briggs
Date: 07 October 2005, 11:26:55
Subject: Re: Some spinlock patch tests

Re: prefix btree implementation - Mailing list pgsql-hackers

Previous

Next