Re: State of the on-disk bitmap index - Mailing list pgsql-hackers

From Gianni Ciolli
Subject Re: State of the on-disk bitmap index
Date
Msg-id 20120905104222.GA5886@leggeri.gi.lan
Whole thread Raw
In response to Re: State of the on-disk bitmap index  (Daniel Bausch <bausch@dvs.tu-darmstadt.de>)
Responses Re: State of the on-disk bitmap index  (Daniel Bausch <bausch@dvs.tu-darmstadt.de>)
List pgsql-hackers
Dear Albe and Daniel,

On Wed, Sep 05, 2012 at 11:28:18AM +0200, Daniel Bausch wrote:
> Hi Albe and the list,
> 
> >> I am going to implement a simple kind of "encoded bitmap indexes" (EBI).
> >> 
> >> I thought, it could be a good idea to base my work on the long proposed
> >> on-disk bitmap index implementation.  Regarding to the wiki, you,
> >> Jonah and Simon, were the last devs that touched this thing.  Unfortunately
> >> I could not find the patch representing your state of that work.  I
> >> could only capture the development history up to Gianni Ciolli & Gabriele
> >> Bartolini from the old pgsql-patches archives.  Other people involved
> >> were Jie Zhang, Gavin Sherry, Heikki Linnakangas, and Leonardo F.  Are
> >> you and the others still interested in getting this into PG?  A rebase
> >> of the most current bitmap index implementation onto master HEAD will
> >> be the first 'byproduct' that I am going to deliver back to you.
> >>
> >> 1. Is anyone working on this currently?
> >> 2. Who has got the most current source code?
> >> 3. Is there a git of that or will I need to reconstruct the history
> >> from
> >> the patches I collected?
> > 
> > It seems like you did not get any answers from any of the
> > people you mentioned ...

My fault: I missed the questions in August, but today my colleague
Gabriele drew my attention to them. I apologise.

> I used the (more recent) patches posted by Gianni Ciolli in 2008 and
> currently am in the process of porting those to master HEAD -- like I
> promised.

Back in 2008 the PostgreSQL project wasn't using git, and I wasn't
either; hence that patch is the best starting point I can find.

> > Another criticism I can imagine is that PostgreSQL already
> > supports a bitmap index scan of b-tree indexes, so you would
> > have to show that on-disk bitmap indexes outperform that
> > in realistic scenarios.  This has probably become more
> > difficult with the recently introduced index-only scan
> > for b-tree indexes, which is particularly helpful in
> > data warehouse scenarios.
> 
> IIRC, it was already shown that bitmap indexes can speed up TPC-H
> queries.  I will compare B+-tree, bitmap, and encoded bitmap indexes.

I think what Albe meant (also what we attempted back then, if memory
serves me, but without reaching completion) is a set of tests which
show a significant benefit of your implementation over the existing
index type implementations in PostgreSQL, to justify the increased
complexity of the source code.

The kind of test I have in mind is: a big table T with a
low-cardinality column C, such that a btree index on C is
significantly larger than the corresponding bitmap index on the same
column.

Create the bitmap index, and run a query like
 SELECT ... FROM T WHERE C = ... 

more than once; then you should notice that subsequent scans are much
faster than the first run, because the index is small enough to fit
the shared memory and will not need to be reloaded from disk at every
scan.

Then drop the bitmap index, and create a btree index on the same
column; this time the index will be too large and subsequent scans
will be slow, because the index blocks must be reloaded from disk at
every scan.

Hope that helps;
best regards,
Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Proof of concept: standalone backend with full FE/BE protocol
Next
From: Kohei KaiGai
Date:
Subject: Re: [bugfix] sepgsql didn't follow the latest core API changes