Re: [PATCHES] WIP: bitmap indexes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCHES] WIP: bitmap indexes
Date
Msg-id 13399.1155734141@sss.pgh.pa.us
Whole thread Raw
In response to Re: [PATCHES] WIP: bitmap indexes  ("Jie Zhang" <jzhang@greenplum.com>)
Responses Re: [PATCHES] WIP: bitmap indexes
List pgsql-hackers
"Jie Zhang" <jzhang@greenplum.com> writes:
> On 8/15/06 6:18 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> Well, as I said, I don't think there's justification for exposing a
>> bitmap index's internal data formats to the rest of the system like
>> that: it's not very future-proof and I don't see that it's buying any
>> significant performance gain.

> The bitmap words in the bitmap index are very simple and can be very
> generic.

They're not generic in the least: there's a compression scheme involved
that you might want to whack around at any time.  So I disagree with the
idea that it's OK to expose the format outside the access/bitmap/ module.

> I like this idea. I think that we can define a new TBMStatus to be
> TBM_STREAM in TIDBitmap.

It occurs to me that what tbm_begin_iterate really is is a constructor
for a stream bitmap object that reads out the contents of a tbm bitmap
(we need a decent name for the non-stream data structure ... maybe
hash bitmap?).  If we think of it like that then we can unify the
ideas some more.

My proposal at this point would be to invent two different Node types,
one for stream bitmaps and one for hash bitmaps.  The initial input to
nodeBitmapHeapscan can be either, but if it's given a hash bitmap then
it stream-ifies it for use.  amgetmulti can return either kind, and
nodeBitmapAnd and nodeBitmapOr can use IsA tests to decide what to do.
Preserving the existing optimization for ORing hash bitmaps is a bit
tricky but I think it's doable.  Consider this API for amgetmulti:

amgetmulti takes an argument which can be either a hash bitmap or NULL.
It returns an object that must be either a hash or stream bitmap.
If it wants to return a stream bitmap, it simply disregards the argument
and returns a constructed stream-bitmap object.  If it wants to return
a hash bitmap, then if the argument is not NULL, OR the additional bits
into the argument object and return it; if the argument is NULL,
construct a fresh hash-bitmap object, set bits in it, and return it.

Assume that we have the existing hash-bitmap AND/OR functions as well as
constructors for AND and OR stream bitmaps that take lists of input
stream objects.  Then the algorithm for nodeBitmapOr looks like this:
HashBitmap *hashBitmap = NULL;List *streamBitmaps = NIL;
foreach(input plan){    Node *newBitmap = amgetmulti(hashBitmap);
    if (IsA(newBitmap, HashBitmap))    {        // any OR-ing required was done implicitly        hashBitmap =
newBitmap;   }    else    {        Assert(IsA(newBitmap, StreamBitmap));        streamBitmaps = lappend(streamBitmaps,
newBitmap);   }}
 
if (streamBitmaps == NIL){    // all inputs returned hash, so we're done    return hashBitmap;}else{    // need a
streamOR operation atop the inputs    if (hashBitmap)        streamBitmaps = lappend(streamBitmaps,
HashToStreamBitmap(hashBitmap));   return ConstructStreamOr(streamBitmaps);}
 

nodeBitmapAnd is a bit different but not any harder.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: BugTracker (Was: Re: 8.2 features status)
Next
From: Volkan YAZICI
Date:
Subject: Re: "cache reference leak" and "problem in alloc set" warnings