Re: [PATCHES] WIP: bitmap indexes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCHES] WIP: bitmap indexes
Date
Msg-id 2273.1155647929@sss.pgh.pa.us
Whole thread Raw
In response to Re: [PATCHES] WIP: bitmap indexes  (Gavin Sherry <swm@linuxworld.com.au>)
Responses Re: [PATCHES] WIP: bitmap indexes
List pgsql-hackers
Gavin Sherry <swm@linuxworld.com.au> writes:
> On Mon, 14 Aug 2006, Tom Lane wrote:
>> Correct me if I'm wrong, but isn't the patch's present hacking on the
>> executor intended to make it happen like this?

> Not really. It reads ahead on the bitmap index and passes back the bitmap
> words. The other executor routines are hacked up to process the data in
> this format.

Well, as I said, I don't think there's justification for exposing a
bitmap index's internal data formats to the rest of the system like
that: it's not very future-proof and I don't see that it's buying any
significant performance gain.  At some point you have to convert to TIDs
anyway, at least in the sense of knowing what page and line number you
are at, so passing the data as an array of TIDs really isn't going to
hurt much.  So my advice is to rip out all those changes and go back to
the existing tidbitmap.c readout API.  There's nothing wrong with
the TBMIterateResult data structure.

What I do find interesting to think about is whether, strictly within
tidbitmap.c, there could be an alternate kind of bitmap object that
doesn't have to materialize the whole bitmap for an indexscan in memory
because it knows it can fetch the data on-demand, ie, build the next
page TBMIterateResult data structure on-the-fly from the index when it's
requested.  Call it a "stream bitmap" in contrast to the present
"materialized bitmaps".  The trick here is to be able to AND and OR a
stream bitmap with another stream bitmap or a materialized bitmap.
I don't see any reason in principle why that couldn't be done: the
output of the AND/OR would be a stream of TBMIterateResults just like
the inputs.  That is, it's another stream bitmap, but with a different
generating function and some internal state that includes its source
bitmaps.  You'd have to sort a materialized bitmap into order before
starting to AND/OR it with a stream bitmap, but that code is there
already.

You'd probably need to break the abstraction enough for nodeBitmapAnd
and nodeBitmapOr to know which kinds of bitmaps they are dealing with
and do things a little differently in different cases --- in particular,
the optimization nodeBitmapOr understands about how to "OR" things
together probably doesn't work for stream bitmaps.  But I see no reason
to be changing nodeBitmapHeapscan.

> ... It seems to me that part of this
> which will be a litle ugly is dealing with "key in (1,2,3,...)"
> transformation. Let me think about it...

You're worrying about the wrong problem if you focus solely on
ScalarArrayOpExpr.  The general problem you need to be solving is
"how do I AND and OR these bitmaps efficiently"?  Once you've fixed
that, the ScalarArrayOpExpr code is just one user of that behavior.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: New beginings
Next
From: Tom Lane
Date:
Subject: Re: [PATCHES] [Patch] - Fix for bug #2558, InitDB failed to run