Re: Bitmap index thoughts - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Bitmap index thoughts
Date
Msg-id 45C85C39.5070401@enterprisedb.com
Whole thread Raw
In response to Re: Bitmap index thoughts  (Gavin Sherry <swm@alcove.com.au>)
List pgsql-hackers
Gavin Sherry wrote:
> On Thu, 1 Feb 2007, Bruce Momjian wrote:
> 
>> Where are we on this patch?  Does it have performance tests to show
>> where it is beneificial?  Is it ready to be reviewed?
> 
> Here's an updated patch:
> 
> http://www.alcove.com.au/~swm/bitmap-2007-02-02.patch
> 
> In this patch, I rewrote the index build system. It was fast before for
> well clustered data but for poorly clustered data, it was very slow. Now,
> it is pretty good for each distribution type.
> 
> I have various test cases but the one which showed bitmap a poor light was
> a table of 600M rows. The key to the table had a cardinality of 100,000.
> When the table was loaded with keys clustered, the build time was 1000
> seconds with bitmap (2200 with btree). With poorly clustered data (e.g.,
> the index key was (1, 2, 3, ..., 6000, 1, 2, 3, ...)), the build time for
> bitmap was 14000 seconds!
> 
> So, I rewrote this to compress data using HRL encoding (the same scheme we
> use in the bitmap AM itself). Now, clustered data is just as fast and
> unclustered data is 2000 seconds.
> 
> The select performance at a cardinality of 100,000 is similar to btree but
> faster with lower cardinalities.
> 
> Jie also contributed a rewrite of the WAL code to this patch. Not only is
> the code faster now, but it handles the notion of incomplete actions --
> like btree and friends do. The executor code still needs some work from me
> -- Jie and I have dirtied things up while experimenting -- but we would
> really like some review of the code so that this can get squared away
> well before the approach of 8.3 feature freeze.
> 
> One of the major deficiencies remaining is the lack of VACUUM support.
> Heikki put his hand up for this and I'm holding him to it! ;-)

Thanks :). I'll take a look at it.

I'm a bit worried that vacuuming can get complicated if an index is in 
fact an index + a heap + a btree. To remove empty lov items and the 
entries in the auxiliary heap and b-tree, you need to:

1. Memorize empty lov-items
2. Scan the heap, and mark the heap tuples corresponding the empty 
lov-items as dead
3. Scan the b-tree, removing pointers to dead heap tuples
4. Remove dead heap tuples
5. Remove empty lov-items

Maybe it's possible to call the existing vacuuming code recursively, but 
it feels quite horrible.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: "Marko Kreen"
Date:
Subject: Re: Pl/pgsql functions causing crashes in 8.2.2
Next
From: "Marko Kreen"
Date:
Subject: Re: Pl/pgsql functions causing crashes in 8.2.2