Re: State of the on-disk bitmap index - Mailing list pgsql-hackers

From Albe Laurenz
Subject Re: State of the on-disk bitmap index
Date
Msg-id D960CB61B694CF459DCFB4B0128514C208589813@exadv11.host.magwien.gv.at
Whole thread Raw
In response to State of the on-disk bitmap index  (Daniel Bausch <bausch@dvs.tu-darmstadt.de>)
Responses Re: State of the on-disk bitmap index  (Daniel Bausch <bausch@dvs.tu-darmstadt.de>)
List pgsql-hackers
Daniel Bausch wrote:
> I am going to implement a simple kind of "encoded bitmap indexes"
(EBI).

> I thought, it could be a good idea to base my work on the long
proposed
> on-disk bitmap index implementation.  Regarding to the wiki, you,
Jonah
> and Simon, were the last devs that touched this thing.  Unfortunately
I
> could not find the patch representing your state of that work.  I
could
> only capture the development history up to Gianni Ciolli & Gabriele
> Bartolini from the old pgsql-patches archives.  Other people involved
> were Jie Zhang, Gavin Sherry, Heikki Linnakangas, and Leonardo F.  Are
> you and the others still interested in getting this into PG?  A rebase
> of the most current bitmap index implementation onto master HEAD will
be
> the first 'byproduct' that I am going to deliver back to you.
>
> 1. Is anyone working on this currently?
> 2. Who has got the most current source code?
> 3. Is there a git of that or will I need to reconstruct the history
from
> the patches I collected?

It seems like you did not get any answers from any of the
people you mentioned ...

The latest version of the patch I found is
http://archives.postgresql.org/pgsql-patches/2006-12/msg00015.php
So that's probably the best you can get.

I want to encourage you to work on this.

You'd have to come up with a sound concept and discuss it on this
list, and it would be helpful to have some draft patch for
git master that can be used as a basis for discussion.

Expect to meet some resistance.  Nobody will want the extra
code and complexity unless you can show suffitient benefits.

One concern that came up in previous discussions is that
bitmap indexes are only useful for columns with low cardinality,
and in that case the result will likely be a significant portion
of the table anyway and a sequential scan would be faster.
I think that this is less true if you have more conditions,
and this is supposedly the case where encoded bitmap indexes
work better anyway.

Another criticism I can imagine is that PostgreSQL already
supports a bitmap index scan of b-tree indexes, so you would
have to show that on-disk bitmap indexes outperform that
in realistic scenarios.  This has probably become more
difficult with the recently introduced index-only scan
for b-tree indexes, which is particularly helpful in
data warehouse scenarios.

So you'd have to run some performance tests against a draft
implementation to get people convinced that it is worth the
effort.  Supporting index-only scans Would probably give
you an edge.

Yours,
Laurenz Albe



pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Next
From: Thom Brown
Date:
Subject: Re: 9.2rc1 produces incorrect results