Re: Bitmap index stuff - Mailing list pgsql-hackers
From | Gavin Sherry |
---|---|
Subject | Re: Bitmap index stuff |
Date | |
Msg-id | Pine.LNX.4.58.0702270957200.27992@linuxworld.com.au Whole thread Raw |
In response to | Bitmap index stuff (Heikki Linnakangas <heikki@enterprisedb.com>) |
List | pgsql-hackers |
On Mon, 26 Feb 2007, Heikki Linnakangas wrote: > Hi, > > How are you doing with the bitmap indexes? I need to send of a patch fixing the last bug you pointed out. The code needs a merge of HEAD. > > I've been trying to get my head around the patch a couple of times to > add the vacuum support, but no matter how simple I try to keep it, I > just always seem to get stuck. > > It looks like vacuum support would need: > > - something similar to read_words, let's call it iterate_set_bits, that > returns each set bit from a bitmap vector, keeping the buffer locked > over calls. > - ability to clear the bit returned by iterate_set_bits. If normal index > scans also used this, the same functions could be used to support the > kill_prior_tuple thingie. Okay. > > The above also needs to be able to recompress a page if it gets > fragmented by repeated setting and clearing of bits. Yes. > I still feel that the data structures are unnecessarily complex. In > particular, I'd like to get rid of the special-cased last_word and > last_comp_word in the lov item. Perhaps we could instead embed a normal, > but smaller, BMBitmapData structure in the lov item, and just add a > length field to that? I'm not sure that this really simplifies the code. I agree things could be simpler though. > > You have a lot of code to support efficient building of a bitmap index. > I know you've worked hard on that, but do we really need all that? How > did the bitmap build work in the previous versions of the patch, and how > much faster is the current approach? I included details on a previous email, I thought. Basically, in cases where the data is distributed as follows: 1 1 1 1 1 1 1 .... 2 2 2 2 2 2 2 .... 3 3 3 3 3 3 3 3 ... We're very fast in both versions. If the data is distributed as: 1 2 3 4 5 6 .... 1 2 3 4 5 6 In the original version(s), we were terribly slow (in my test, 7 times slower than btree). Considering the kind of data sets bitmap suits, this made bitmap unusable. With the rewrite, we're much faster (in my test, faster than btree). The test case was: a table with 600M rows with 100,000 distinct keys to be indexed. > BTW: It occured to me that since we're piggybacking on b-tree's strategy > numbers, comparison operators etc, conceivably we could also use any > other indexam. For example, a bitmap GiST would be pretty funky. We'll > probably leave that for future versions, but have you given that any > thought? True. I haven't given it any thought though. Interesting... I'd have to think of some interesting data sets which would suit the capabilities (operators) we have with GiST. Thanks, Gavin
pgsql-hackers by date: