Re: [PATCHES] WIP: bitmap indexes - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [PATCHES] WIP: bitmap indexes
Date
Msg-id 200609052230.k85MU6P03691@momjian.us
Whole thread Raw
In response to Re: [PATCHES] WIP: bitmap indexes  ("Jie Zhang" <jzhang@greenplum.com>)
List pgsql-hackers
This has been saved for the 8.3 release:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------
Jie Zhang wrote:
> 
> 
> On 8/15/06 6:18 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> 
> > Gavin Sherry <swm@linuxworld.com.au> writes:
> >> On Mon, 14 Aug 2006, Tom Lane wrote:
> >>> Correct me if I'm wrong, but isn't the patch's present hacking on the
> >>> executor intended to make it happen like this?
> > 
> >> Not really. It reads ahead on the bitmap index and passes back the bitmap
> >> words. The other executor routines are hacked up to process the data in
> >> this format.
> > 
> > Well, as I said, I don't think there's justification for exposing a
> > bitmap index's internal data formats to the rest of the system like
> > that: it's not very future-proof and I don't see that it's buying any
> > significant performance gain.  At some point you have to convert to TIDs
> > anyway, at least in the sense of knowing what page and line number you
> > are at, so passing the data as an array of TIDs really isn't going to
> > hurt much.  So my advice is to rip out all those changes and go back to
> > the existing tidbitmap.c readout API.  There's nothing wrong with
> > the TBMIterateResult data structure.
> >
> 
> The bitmap words in the bitmap index are very simple and can be very
> generic. You can think about them as one bit per tuple along with some
> padding bits between heap pages. The problem I have is that I do not know a
> good way to construct an in-memory version of this for other index
> structures, like b-tree. To be able to handle both cases nicely, you are
> right -- TBMIterateResult is better. Or, PagetableEntry may be better since
> it will make AND/OR easier.
> 
> > What I do find interesting to think about is whether, strictly within
> > tidbitmap.c, there could be an alternate kind of bitmap object that
> > doesn't have to materialize the whole bitmap for an indexscan in memory
> > because it knows it can fetch the data on-demand, ie, build the next
> > page TBMIterateResult data structure on-the-fly from the index when it's
> > requested.  Call it a "stream bitmap" in contrast to the present
> > "materialized bitmaps".  The trick here is to be able to AND and OR a
> > stream bitmap with another stream bitmap or a materialized bitmap.
> > I don't see any reason in principle why that couldn't be done: the
> > output of the AND/OR would be a stream of TBMIterateResults just like
> > the inputs.  That is, it's another stream bitmap, but with a different
> > generating function and some internal state that includes its source
> > bitmaps.  You'd have to sort a materialized bitmap into order before
> > starting to AND/OR it with a stream bitmap, but that code is there
> > already.
> 
> I like this idea. I think that we can define a new TBMStatus to be
> TBM_STREAM in TIDBitmap. *getmulti functions will remain the same, except
> that we add a new returning bool argument, stating if this is a stream
> bitmap. If this is a stream bitmap, nodeBitmapIndexScan simply fills spages,
> and passes it upstream. When nodeBitmapAnd or nodeBitmapOr ANDs/ORs several
> bitmaps, the result bitmap is a stream bitmap if there is at least one
> bitmap is a stream bitmap. Then we add another loop in nodeBitmapHeapscan to
> be able to pull more data from its subnode.
> 
> Thanks,
> Jie
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Win32 hard crash problem
Next
From: Bruce Momjian
Date:
Subject: Re: ISBN/ISSN/ISMN/EAN13 module