Re: State of the on-disk bitmap index - Mailing list pgsql-hackers
From | Gianni Ciolli |
---|---|
Subject | Re: State of the on-disk bitmap index |
Date | |
Msg-id | 20120905104222.GA5886@leggeri.gi.lan Whole thread Raw |
In response to | Re: State of the on-disk bitmap index (Daniel Bausch <bausch@dvs.tu-darmstadt.de>) |
Responses |
Re: State of the on-disk bitmap index
|
List | pgsql-hackers |
Dear Albe and Daniel, On Wed, Sep 05, 2012 at 11:28:18AM +0200, Daniel Bausch wrote: > Hi Albe and the list, > > >> I am going to implement a simple kind of "encoded bitmap indexes" (EBI). > >> > >> I thought, it could be a good idea to base my work on the long proposed > >> on-disk bitmap index implementation. Regarding to the wiki, you, > >> Jonah and Simon, were the last devs that touched this thing. Unfortunately > >> I could not find the patch representing your state of that work. I > >> could only capture the development history up to Gianni Ciolli & Gabriele > >> Bartolini from the old pgsql-patches archives. Other people involved > >> were Jie Zhang, Gavin Sherry, Heikki Linnakangas, and Leonardo F. Are > >> you and the others still interested in getting this into PG? A rebase > >> of the most current bitmap index implementation onto master HEAD will > >> be the first 'byproduct' that I am going to deliver back to you. > >> > >> 1. Is anyone working on this currently? > >> 2. Who has got the most current source code? > >> 3. Is there a git of that or will I need to reconstruct the history > >> from > >> the patches I collected? > > > > It seems like you did not get any answers from any of the > > people you mentioned ... My fault: I missed the questions in August, but today my colleague Gabriele drew my attention to them. I apologise. > I used the (more recent) patches posted by Gianni Ciolli in 2008 and > currently am in the process of porting those to master HEAD -- like I > promised. Back in 2008 the PostgreSQL project wasn't using git, and I wasn't either; hence that patch is the best starting point I can find. > > Another criticism I can imagine is that PostgreSQL already > > supports a bitmap index scan of b-tree indexes, so you would > > have to show that on-disk bitmap indexes outperform that > > in realistic scenarios. This has probably become more > > difficult with the recently introduced index-only scan > > for b-tree indexes, which is particularly helpful in > > data warehouse scenarios. > > IIRC, it was already shown that bitmap indexes can speed up TPC-H > queries. I will compare B+-tree, bitmap, and encoded bitmap indexes. I think what Albe meant (also what we attempted back then, if memory serves me, but without reaching completion) is a set of tests which show a significant benefit of your implementation over the existing index type implementations in PostgreSQL, to justify the increased complexity of the source code. The kind of test I have in mind is: a big table T with a low-cardinality column C, such that a btree index on C is significantly larger than the corresponding bitmap index on the same column. Create the bitmap index, and run a query like SELECT ... FROM T WHERE C = ... more than once; then you should notice that subsequent scans are much faster than the first run, because the index is small enough to fit the shared memory and will not need to be reloaded from disk at every scan. Then drop the bitmap index, and create a btree index on the same column; this time the index will be too large and subsequent scans will be slow, because the index blocks must be reloaded from disk at every scan. Hope that helps; best regards, Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it
pgsql-hackers by date: