Re: State of the on-disk bitmap index - Mailing list pgsql-hackers

From Daniel Bausch
Subject Re: State of the on-disk bitmap index
Date
Msg-id 50473997.2050706@dvs.tu-darmstadt.de
Whole thread Raw
In response to Re: State of the on-disk bitmap index  (Gianni Ciolli <gianni.ciolli@2ndquadrant.it>)
Responses Re: State of the on-disk bitmap index  (Gianni Ciolli <gianni.ciolli@devise.it>)
List pgsql-hackers
Hi Gianni!

Thank you for your attention and response!

>> I used the (more recent) patches posted by Gianni Ciolli in 2008 and
>> currently am in the process of porting those to master HEAD -- like I
>> promised.
>
> Back in 2008 the PostgreSQL project wasn't using git, and I wasn't
> either; hence that patch is the best starting point I can find.

Ok, fine.  However, while I do not find the mail at the moment, I think,
someone said, he fixed the VACUUM.  Additionally, the Wiki lists Simon
and Jonah as the last authors, pretending they prepared a patch for 8.5.

>> IIRC, it was already shown that bitmap indexes can speed up TPC-H
>> queries.  I will compare B+-tree, bitmap, and encoded bitmap indexes.
>
> I think what Albe meant (also what we attempted back then, if memory
> serves me, but without reaching completion) is a set of tests which
> show a significant benefit of your implementation over the existing
> index type implementations in PostgreSQL, to justify the increased
> complexity of the source code.
>
> The kind of test I have in mind is: a big table T with a
> low-cardinality column C, such that a btree index on C is
> significantly larger than the corresponding bitmap index on the same
> column.
>
> Create the bitmap index, and run a query like
>
>   SELECT ... FROM T WHERE C = ...
>
> more than once; then you should notice that subsequent scans are much
> faster than the first run, because the index is small enough to fit
> the shared memory and will not need to be reloaded from disk at every
> scan.
>
> Then drop the bitmap index, and create a btree index on the same
> column; this time the index will be too large and subsequent scans
> will be slow, because the index blocks must be reloaded from disk at
> every scan.
>
> Hope that helps;

Is that, what your bmi-perf-test.tar.gz from 2008 does?  I did not look
into that.  I will at least do something like you just described plus
some TPC-H test.  As the encoding helps against the cardinality
problems, I will also draw comparisons with different cardinalities.

Yours sincerely,
Daniel

--
Daniel Bausch
Wissenschaftlicher Mitarbeiter
Technische Universität Darmstadt
Fachbereich Informatik
Fachgebiet Datenbanken und Verteilte Systeme

Hochschulstraße 10
64289 Darmstadt
Germany

Tel.: +49 6151 16 6706
Fax:  +49 6151 16 6229



pgsql-hackers by date:

Previous
From: Kohei KaiGai
Date:
Subject: Re: [bugfix] sepgsql didn't follow the latest core API changes
Next
From: Amit Kapila
Date:
Subject: Re: Proof of concept: standalone backend with full FE/BE protocol