Re: plans for bitmap indexes? - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: plans for bitmap indexes? |
Date | |
Msg-id | 01ba01c4b638$32460550$6400a8c0@Nightingale Whole thread Raw |
In response to | Re: plans for bitmap indexes? (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: plans for bitmap indexes?
Re: plans for bitmap indexes? |
List | pgsql-hackers |
> Tom Lane > "Simon Riggs" <simon@2ndquadrant.com> writes: > > How would you dynamically build the bit maps from the indexes? > > > Or would you: > > - copy aside and sort the indexes on CTID > > - merge join them all to find matching CTIDs > > - probe into the main table > > I've been taking "bitmap" to be a rather handwavy way of saying "a > compact representation of sets of CTIDs that is readily amenable to > being ANDed and ORed with other sets". I don't think it'll be a pure > bitmap with no other superstructure; at the very least you'd want to > apply some sort of sparse-bitmap and/or compression techniques. I do > suspect a bitmappy kind of representation will be more effective than > sorting arrays of CTIDs per se, although in principle you could do it > that way too. > OK. You seemed to be implying that. > (What you lose is the ability to retrieve data in > index order, so this isn't a replacement for existing indexscan methods, > just another plan type to consider.) Never seen an application that required a bitmap plan and sorted output. Have you? Mostly count(*), often sum() or avg(), but never sorted, surely. Considering there would always be >1 index, which index order did we want anyhow? > One interesting thought is that the bitmappy representation could be > lossy. For instance, once you get to the point of needing to examine > most of the rows on a particular page, it's probably not worth > remembering exactly which rows; you could just remember that that whole > page is a target, and sequentially scan all the rows on it when you do > visit the heap. (ANDing and ORing still works.) This can scale up to > visiting consecutive ranges of pages; in the limit the operation > degenerates to a seqscan. With this idea you can guarantee that the > in-memory bitmaps never get impracticably large. (Obviously if they get > so large as to push the system into swapping, or even run the backend > out of memory completely, you lose, so this is a real nice guarantee to > be able to make.) The whole thing starts to look like a self-adaptive > interpolation between our present indexscan and seqscan techniques, > which takes a lot of pressure off the planner to correctly guess the > number of matching rows in advance. Well, thats the best one yet. That's the solution, if ever I heard it. The reduction in bitmap size makes their use much safer. Size matters, since we're likely to start using these techniques on very large databases, which imply obviously have very large CTID lists. The problem with guessing the number of rows is you're never too sure whether its worth the startup overhead of using the bitmap technique. ....my next question was going to be, so how will you know when to use the technique? Hmmm....think....you'd need to be clear that the cost of scanning a block didn't make the whole thing impractical. Generally, since we're using this technique to access infrequent row combinations, we'd be looking at no more than one row per block usually anyway. So the technique is still I/O bound - a bit extra post I/O cpu work won't hurt much. OK, cool. > I remember batting these ideas around with people at the 2001 "OSDB > summit" conference ... I didn't think it would take us this long to get > around to doing it ... ...as if you haven't been busy... ;-) Best Regards, Simon Riggs
pgsql-hackers by date: