Re: plans for bitmap indexes? - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: plans for bitmap indexes? |
Date | |
Msg-id | 27879.1098227105@sss.pgh.pa.us Whole thread Raw |
In response to | Re: plans for bitmap indexes? ("Simon Riggs" <simon@2ndquadrant.com>) |
Responses |
Re: plans for bitmap indexes?
Re: plans for bitmap indexes? |
List | pgsql-hackers |
"Simon Riggs" <simon@2ndquadrant.com> writes: > I was thinking about this recently, then realised that building the bitmap > would not be as easily, since PostgreSQL doesn't index null values. As Alvaro already pointed out, this statement is bogus; and I'm not sure what it has to do with the topic anyway. All you care about is the rows that the index fingers as matching your scan condition. If the scan condition is strict (which it usually is) it does not matter whether the index stores entries for nulls or not. > How would you dynamically build the bit maps from the indexes? > Or would you: > - copy aside and sort the indexes on CTID > - merge join them all to find matching CTIDs > - probe into the main table I've been taking "bitmap" to be a rather handwavy way of saying "a compact representation of sets of CTIDs that is readily amenable to being ANDed and ORed with other sets". I don't think it'll be a pure bitmap with no other superstructure; at the very least you'd want to apply some sort of sparse-bitmap and/or compression techniques. I do suspect a bitmappy kind of representation will be more effective than sorting arrays of CTIDs per se, although in principle you could do it that way too. But yeah, the basic idea is to scan an index and build some sort of in-memory set of CTIDs of selected rows; possibly AND or OR this with other sets built from other indexes; and then scan the set and probe into the heap at the indicated places. One huge advantage is that the actual heap visiting becomes efficient, eg you never visit the same page more than once. (What you lose is the ability to retrieve data in index order, so this isn't a replacement for existing indexscan methods, just another plan type to consider.) One interesting thought is that the bitmappy representation could be lossy. For instance, once you get to the point of needing to examine most of the rows on a particular page, it's probably not worth remembering exactly which rows; you could just remember that that whole page is a target, and sequentially scan all the rows on it when you do visit the heap. (ANDing and ORing still works.) This can scale up to visiting consecutive ranges of pages; in the limit the operation degenerates to a seqscan. With this idea you can guarantee that the in-memory bitmaps never get impracticably large. (Obviously if they get so large as to push the system into swapping, or even run the backend out of memory completely, you lose, so this is a real nice guarantee to be able to make.) The whole thing starts to look like a self-adaptive interpolation between our present indexscan and seqscan techniques, which takes a lot of pressure off the planner to correctly guess the number of matching rows in advance. I remember batting these ideas around with people at the 2001 "OSDB summit" conference ... I didn't think it would take us this long to get around to doing it ... regards, tom lane
pgsql-hackers by date: