Re: Index Tuple Compression Approach? - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Index Tuple Compression Approach?
Date
Msg-id 46C29460.8070002@enterprisedb.com
Whole thread Raw
In response to Re: Index Tuple Compression Approach?  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Index Tuple Compression Approach?  ("Simon Riggs" <simon@2ndquadrant.com>)
Re: Index Tuple Compression Approach?  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
Jeff Davis wrote:
> On Tue, 2007-08-14 at 16:27 -0500, Decibel! wrote:
>> Isn't this what Grouped Index Tuples is?
> 
> http://community.enterprisedb.com/git/git-readme.txt
> 
> It looks like GIT is a little different. 
> 
> GIT actually stores a lower-bound key of a contiguous* range of keys
> that all point to the same page, and for each of those ranges stores a
> bitmap of page offsets. A search searches first for an exact match in
> the index, and failing that, looks to see if the previous index tuple
> happens to be one of these ranges.
> 
> The algorithm Chris is talking about stores a set of tuple ids (which
> include page and offset) for each distinct key.

Right.

> Both could be helpful, although I don't think they can work together
> very well.

What Chris is suggesting is basically a special case of GIT, where all
the heap tuples represented by an index tuple have the same key. I was
actually thinking of adding a flag to index tuples to indicate that
special case in GIT. We could effectively do both.

> GIT has the disadvantage that it's "lossy". It doesn't even store every
> key in the index, so it can't be sure that the match actually is a
> match. Thus, it returns candidate matches. That also has implications
> for enforcing UNIQUE (although it's not impossible, according to the
> readme). However, GIT can be used effectively on an index that happens
> to be unique. GIT also assumes a tree structure, and makes no sense for
> a hash index, and makes no sense for a types without ordering. GIT's
> space savings is dependent on the clustering of the table.
> 
> Chris's suggestion would work on a UNIQUE index, but would be no help at
> all, because there would be no duplicate keys to collapse. However, it
> could be used for non-tree indexes. The space savings for this strategy
> is dependent on how repetitive the keys are.

Right. I wasn't concerned about the case of a lot of duplicate keys,
because the bitmap index is more efficient for that. And also because
HOT should reduce the number of index tuples with duplicate keys
pointing to different versions of the same row.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: CVS corruption/mistagging?
Next
From: Magnus Hagander
Date:
Subject: Re: CVS corruption/mistagging?