On-disk bitmap index implementation - Mailing list pgsql-patches

From Gavin Sherry
Subject On-disk bitmap index implementation
Date
Msg-id Pine.LNX.4.58.0612042347490.18986@linuxworld.com.au
Whole thread Raw
Responses Re: On-disk bitmap index implementation  ("Simon Riggs" <simon@2ndquadrant.com>)
Re: On-disk bitmap index implementation  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Re: On-disk bitmap index implementation  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Re: On-disk bitmap index implementation  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-patches
Hi all,

Attached is a patch implementing bitmap indexes. It includes major
enhancements on the patch submitted during feature freeze for 8.2 here[1].

In particular: much better integration with the existing bitmap scan code
with the internals of the bitmap streaming pushed down into the AM and
hidden from the executor code; completely new index creation algorithm
which reduced creation time by 20-75% depending on the data; modifications
to the encoding mechanism to suit the integration with bitmap index scans;
work on memory management; lots of code rewriting; range query support.
The code is also much cleaner now.

There are still some things Jie and I have not gotten to yet:

o Improving VACUUM support -- currently, VACUUM FULL means REINDEX for
  bitmaps. Heikki Linnakangas offered to work on this. Heikki, are you
  still interested?

o Determine if we need to provide anything for rm_startup, rm_cleanup,
  rm_safe_restartpoint RmgrData function pointers.

o Test WAL replay more thoroughly.

o I pulled a nice optimisation out of the bitmap scan OR case where a
  higher level plan could push down a bitmap and have all the child scans
  just OR their data into that inside the AM. I need to get that back in.

o I need to look at tidying up the bitmap stream memory usage insider the
  executor. We leak memory from ExecBitmapIndexReScan(), for example.

o Really should add some more detailed docs about why bitmap indexes are
  cool.

o Look into adding an AM option such that the user can determine word size
  at index creation time. For higher-cardinality data (above 1000 distinct
  values), 16 bit word sizes can really help with performance. Although
  the word size is not just assumed to be a certain size across the code,
  macros are used extensively to interact with the word size. Making it
  different for each index might be a little messy.

Comments please!

Gavin



[1] http://archives.postgresql.org/pgsql-patches/2006-09/msg00216.php

Attachment

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: GUC description cleanup
Next
From: Zdenek Kotala
Date:
Subject: Re: [HACKERS] Dynamic Tracing docs