Re: Bitmap index stuff - Mailing list pgsql-hackers

From Gavin Sherry
Subject Re: Bitmap index stuff
Date
Msg-id Pine.LNX.4.58.0702270957200.27992@linuxworld.com.au
Whole thread Raw
In response to Bitmap index stuff  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-hackers
On Mon, 26 Feb 2007, Heikki Linnakangas wrote:

> Hi,
>
> How are you doing with the bitmap indexes?

I need to send of a patch fixing the last bug you pointed out. The code
needs a merge of HEAD.

>
> I've been trying to get my head around the patch a couple of times to
> add the vacuum support, but no matter how simple I try to keep it, I
> just always seem to get stuck.
>
> It looks like vacuum support would need:
>
> - something similar to read_words, let's call it iterate_set_bits, that
> returns each set bit from a bitmap vector, keeping the buffer locked
> over calls.
> - ability to clear the bit returned by iterate_set_bits. If normal index
> scans also used this, the same functions could be used to support the
> kill_prior_tuple thingie.

Okay.

>
> The above also needs to be able to recompress a page if it gets
> fragmented by repeated setting and clearing of bits.

Yes.

> I still feel that the data structures are unnecessarily complex. In
> particular, I'd like to get rid of the special-cased last_word and
> last_comp_word in the lov item. Perhaps we could instead embed a normal,
> but smaller, BMBitmapData structure in the lov item, and just add a
> length field to that?

I'm not sure that this really simplifies the code. I agree things could be
simpler though.

>
> You have a lot of code to support efficient building of a bitmap index.
> I know you've worked hard on that, but do we really need all that? How
> did the bitmap build work in the previous versions of the patch, and how
> much faster is the current approach?

I included details on a previous email, I thought. Basically, in cases
where the data is distributed as follows:

1 1 1 1 1 1 1 .... 2 2 2 2 2 2 2 .... 3 3 3 3 3 3 3 3 ...

We're very fast in both versions. If the data is distributed as:

1 2 3 4 5 6 .... 1 2 3 4 5 6

In the original version(s), we were terribly slow (in my test, 7 times
slower than btree). Considering the kind of data sets bitmap suits, this
made bitmap unusable. With the rewrite, we're much faster (in my test,
faster than btree).

The test case was: a table with 600M rows with 100,000 distinct keys to be
indexed.

> BTW: It occured to me that since we're piggybacking on b-tree's strategy
> numbers, comparison operators etc, conceivably we could also use any
> other indexam. For example, a bitmap GiST would be pretty funky. We'll
> probably leave that for future versions, but have you given that any
> thought?

True. I haven't given it any thought though. Interesting... I'd have to
think of some interesting data sets which would suit the capabilities
(operators) we have with GiST.

Thanks,

Gavin


pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: COMMIT NOWAIT Performance Option
Next
From: Tom Lane
Date:
Subject: Re: Acclerating INSERT/UPDATE using UPS