Re: plans for bitmap indexes? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: plans for bitmap indexes?
Date
Msg-id 27879.1098227105@sss.pgh.pa.us
Whole thread Raw
In response to Re: plans for bitmap indexes?  ("Simon Riggs" <simon@2ndquadrant.com>)
Responses Re: plans for bitmap indexes?
Re: plans for bitmap indexes?
List pgsql-hackers
"Simon Riggs" <simon@2ndquadrant.com> writes:
> I was thinking about this recently, then realised that building the bitmap
> would not be as easily, since PostgreSQL doesn't index null values.

As Alvaro already pointed out, this statement is bogus; and I'm not sure
what it has to do with the topic anyway.  All you care about is the rows
that the index fingers as matching your scan condition.  If the scan
condition is strict (which it usually is) it does not matter whether the
index stores entries for nulls or not.

> How would you dynamically build the bit maps from the indexes?

> Or would you:
> - copy aside and sort the indexes on CTID
> - merge join them all to find matching CTIDs
> - probe into the main table

I've been taking "bitmap" to be a rather handwavy way of saying "a
compact representation of sets of CTIDs that is readily amenable to
being ANDed and ORed with other sets".  I don't think it'll be a pure
bitmap with no other superstructure; at the very least you'd want to
apply some sort of sparse-bitmap and/or compression techniques.  I do
suspect a bitmappy kind of representation will be more effective than
sorting arrays of CTIDs per se, although in principle you could do it
that way too.

But yeah, the basic idea is to scan an index and build some sort of
in-memory set of CTIDs of selected rows; possibly AND or OR this with
other sets built from other indexes; and then scan the set and probe
into the heap at the indicated places.  One huge advantage is that the
actual heap visiting becomes efficient, eg you never visit the same page
more than once.  (What you lose is the ability to retrieve data in
index order, so this isn't a replacement for existing indexscan methods,
just another plan type to consider.)

One interesting thought is that the bitmappy representation could be
lossy.  For instance, once you get to the point of needing to examine
most of the rows on a particular page, it's probably not worth
remembering exactly which rows; you could just remember that that whole
page is a target, and sequentially scan all the rows on it when you do
visit the heap.  (ANDing and ORing still works.)  This can scale up to
visiting consecutive ranges of pages; in the limit the operation
degenerates to a seqscan.  With this idea you can guarantee that the
in-memory bitmaps never get impracticably large.  (Obviously if they get
so large as to push the system into swapping, or even run the backend
out of memory completely, you lose, so this is a real nice guarantee to
be able to make.)  The whole thing starts to look like a self-adaptive
interpolation between our present indexscan and seqscan techniques,
which takes a lot of pressure off the planner to correctly guess the
number of matching rows in advance.

I remember batting these ideas around with people at the 2001 "OSDB
summit" conference ... I didn't think it would take us this long to get
around to doing it ...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: plans for bitmap indexes?
Next
From: Josh Berkus
Date:
Subject: Re: plans for bitmap indexes?