Re: plans for bitmap indexes? - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: plans for bitmap indexes?
Date
Msg-id 200411040357.iA43vf925972@candle.pha.pa.us
Whole thread Raw
In response to Re: plans for bitmap indexes?  ("Simon Riggs" <simon@2ndquadrant.com>)
Responses Re: plans for bitmap indexes?
List pgsql-hackers
Updated TODO:

* Allow the creation of bitmap indexes which can be quickly combined with other bitmap indexes
 Bitmap indexes index single columns that can be combined with other bitmap indexes to dynamically create a composite
indexto match a specific query. Each index is a bitmap, and the bitmaps are bitwise AND'ed or OR'ed to be combined.
Suchindexes could be more compact if there are few unique value.  Also, perhaps they can be lossy requiring a scan of
theheap page to find matching rows.
 

* Allow non-bitmap indexes to be combined
 Do lookups on non-bitmap indexes and create bitmaps in memory that can be combined with other indexes.


---------------------------------------------------------------------------

Simon Riggs wrote:
> > Tom Lane
> > "Simon Riggs" <simon@2ndquadrant.com> writes:
> 
> > > How would you dynamically build the bit maps from the indexes?
> >
> > > Or would you:
> > > - copy aside and sort the indexes on CTID
> > > - merge join them all to find matching CTIDs
> > > - probe into the main table
> >
> > I've been taking "bitmap" to be a rather handwavy way of saying "a
> > compact representation of sets of CTIDs that is readily amenable to
> > being ANDed and ORed with other sets".  I don't think it'll be a pure
> > bitmap with no other superstructure; at the very least you'd want to
> > apply some sort of sparse-bitmap and/or compression techniques.  I do
> > suspect a bitmappy kind of representation will be more effective than
> > sorting arrays of CTIDs per se, although in principle you could do it
> > that way too.
> >
> 
> OK. You seemed to be implying that.
> 
> > (What you lose is the ability to retrieve data in
> > index order, so this isn't a replacement for existing indexscan methods,
> > just another plan type to consider.)
> 
> Never seen an application that required a bitmap plan and sorted output.
> Have you? Mostly count(*), often sum() or avg(), but never sorted, surely.
> 
> Considering there would always be >1 index, which index order did we want
> anyhow?
> 
> > One interesting thought is that the bitmappy representation could be
> > lossy.  For instance, once you get to the point of needing to examine
> > most of the rows on a particular page, it's probably not worth
> > remembering exactly which rows; you could just remember that that whole
> > page is a target, and sequentially scan all the rows on it when you do
> > visit the heap.  (ANDing and ORing still works.)  This can scale up to
> > visiting consecutive ranges of pages; in the limit the operation
> > degenerates to a seqscan.  With this idea you can guarantee that the
> > in-memory bitmaps never get impracticably large.  (Obviously if they get
> > so large as to push the system into swapping, or even run the backend
> > out of memory completely, you lose, so this is a real nice guarantee to
> > be able to make.)  The whole thing starts to look like a self-adaptive
> > interpolation between our present indexscan and seqscan techniques,
> > which takes a lot of pressure off the planner to correctly guess the
> > number of matching rows in advance.
> 
> Well, thats the best one yet. That's the solution, if ever I heard it.
> 
> The reduction in bitmap size makes their use much safer. Size matters, since
> we're likely to start using these techniques on very large databases, which
> imply obviously have very large CTID lists. The problem with guessing the
> number of rows is you're never too sure whether its worth the startup
> overhead of using the bitmap technique. ....my next question was going to
> be, so how will you know when to use the technique?
> 
> Hmmm....think....you'd need to be clear that the cost of scanning a block
> didn't make the whole thing impractical. Generally, since we're using this
> technique to access infrequent row combinations, we'd be looking at no more
> than one row per block usually anyway. So the technique is still I/O bound -
> a bit extra post I/O cpu work won't hurt much. OK, cool.
> 
> > I remember batting these ideas around with people at the 2001 "OSDB
> > summit" conference ... I didn't think it would take us this long to get
> > around to doing it ...
> 
> ...as if you haven't been busy... ;-)
> 
> Best Regards, Simon Riggs
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Robert Treat
Date:
Subject: Re: UPDATE is not allowed in a non-volatile function
Next
From: "Roland Volkmann"
Date:
Subject: Charset WIN1252