Home > mailing lists

Re: Index Tuple Compression Approach? - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Index Tuple Compression Approach?
Date	August 15, 2007 05:51:55
Msg-id	46C29460.8070002@enterprisedb.com Whole thread Raw
In response to	Re: Index Tuple Compression Approach? (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: Index Tuple Compression Approach? ("Simon Riggs" <simon@2ndquadrant.com>) Re: Index Tuple Compression Approach? (Jeff Davis <pgsql@j-davis.com>)
List	pgsql-hackers

Tree view

Jeff Davis wrote:
> On Tue, 2007-08-14 at 16:27 -0500, Decibel! wrote:
>> Isn't this what Grouped Index Tuples is?
> 
> http://community.enterprisedb.com/git/git-readme.txt
> 
> It looks like GIT is a little different. 
> 
> GIT actually stores a lower-bound key of a contiguous* range of keys
> that all point to the same page, and for each of those ranges stores a
> bitmap of page offsets. A search searches first for an exact match in
> the index, and failing that, looks to see if the previous index tuple
> happens to be one of these ranges.
> 
> The algorithm Chris is talking about stores a set of tuple ids (which
> include page and offset) for each distinct key.

Right.

> Both could be helpful, although I don't think they can work together
> very well.

What Chris is suggesting is basically a special case of GIT, where all
the heap tuples represented by an index tuple have the same key. I was
actually thinking of adding a flag to index tuples to indicate that
special case in GIT. We could effectively do both.

> GIT has the disadvantage that it's "lossy". It doesn't even store every
> key in the index, so it can't be sure that the match actually is a
> match. Thus, it returns candidate matches. That also has implications
> for enforcing UNIQUE (although it's not impossible, according to the
> readme). However, GIT can be used effectively on an index that happens
> to be unique. GIT also assumes a tree structure, and makes no sense for
> a hash index, and makes no sense for a types without ordering. GIT's
> space savings is dependent on the clustering of the table.
> 
> Chris's suggestion would work on a UNIQUE index, but would be no help at
> all, because there would be no duplicate keys to collapse. However, it
> could be used for non-tree indexes. The space savings for this strategy
> is dependent on how repetitive the keys are.

Right. I wasn't concerned about the case of a lot of duplicate keys,
because the bitmap index is more efficient for that. And also because
HOT should reduce the number of index tuples with duplicate keys
pointing to different versions of the same row.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Tom Lane
Date: 15 August 2007, 05:18:31
Subject: Re: CVS corruption/mistagging?

From: Magnus Hagander
Date: 15 August 2007, 05:56:52
Subject: Re: CVS corruption/mistagging?

Re: Index Tuple Compression Approach? - Mailing list pgsql-hackers

Previous

Next