Bitmap Indexes: request for feedback - Mailing list pgsql-hackers

From Gianni Ciolli
Subject Bitmap Indexes: request for feedback
Date
Msg-id 20081021145759.GA3349@fune
Whole thread Raw
Responses Re: Bitmap Indexes: request for feedback
Bitmap Indexes patch (was Re: Bitmap Indexes: request for feedback)
List pgsql-hackers
Hi everybody,

me and Gabriele Bartolini have been working on Bitmap Indexes (BMI) in
the last weeks, with advice and guidance from Simon Riggs. We feel
that we are about to approach the point where it is appropriate to ask
for feedback from this list.

Thank you,
Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it

---8<------8<------8<------8<------8<------8<------8<------8<------8<---

First of all, let us describe what we have done so far, what we found
and how we think to proceed now.

== 1. Bringing the patch up to date ==

We started from Gavin Sherry's patch dating May 2007
 http://archives.postgresql.org/pgsql-patches/2007-05/msg00013.php

As far as we could see, there has been no further activity on this
subject.

First, we brought the patch up to date. The latest version of the
patch was anterior to the new Index Access Method Interface
 http://developer.postgresql.org/pgdocs/postgres/indexam.html

so we adapted the patch to that interface.

Then we added a few BMI page inspection functions to the pageinspect
contrib module, and we used them to examine the code. In addition to
finding and fixing a minor bug, we diagnosed an effect of HOT tuples
on the BMI patch, described below in greater detail. This also helped
us to produce extended descriptive documentation of how these indexes
work, and suggested us how to produce some more tests to verify that
(a newer version of) the BMI patch works; we are going to add some
regression tests especially targeted to HOT tuples.

After message 
 http://archives.postgresql.org/pgsql-hackers/2008-10/msg00855.php

maybe it is appropriate to mention that backwards scan would not be
supported at all by BMI indexes.

== 2. The effect of HOT tuples on BMI creation ==

The latest BMI patch mentioned above was also prior to the
introduction of HOT tuples.

Some parts of that patch rely on the assumption that
IndexBuildHeapScan scans tuples in increasing TID order. It is easy to
verify that this property is no longer valid after the introduction of
HOT tuples; however, a similar but weaker property still holds (the
scan is always done in non-decreasing block order).

This breaks some low-level bitmap vector build routines, which have to
be rewritten from scratch because they expect TIDs to came in
increasing order; but it does not harm block-level locking used in
that patch.

== 3. What we would do after  ==

We understand that BMI development was suspended because of lack of
time from the last developer, during the improvement of the VACUUM
phase. The main obstacle was that the physical size of a compressed
bitmap vector can either grow or shrink, possibly creating new BMV
pages, which can mean bad performance.

The current VACUUM algorithm is unfinished; we are going to examine
it, looking for some improvements, and to measure the current status
with some ad-hoc benchmarks.

== 4. Timeline ==

Up to now, we spent many days to isolate, describe and partially fix
the incompatibilies described above; now we feel that points 1. and
2. can be cleared in a couple of days, bringing the patch up to date
with current HEAD.

As for the remaining part, we expect to finish the patch before the
deadline for the latest CommitFest.

We will re-post the patch as soon as the HOT tuples will be working;
then we will post a new version the patch when also VACUUM will be
done.

Does anybody have any comments and/or additional requests?


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: Withdraw PL/Proxy from commitfest
Next
From: Martijn van Oosterhout
Date:
Subject: Re: SSL cleanups/hostname verification