On Fri, Nov 07, 2008 at 03:56:25PM +0300, Teodor Sigaev wrote:
> After looking at additional heap and b-tree index placed in 
> pg_bitmapindex namespace...
>
> Additional heap contains unique values and page's number with offset 
> number in bitmap index, b-tree index contains tuples with the same values 
> and ItemPointer to heap's row. So, heap is an unnecessary step - b-tree 
> index should store ItemPointer to the bitmap index directly.
Hi Teodor,
B-tree points to LOVItem (heap's row) because the LOVItem contains
some vector metadata, plus the last two words of the actual bitmap
vector, last_compword and last_word; they are stored there because
they will be changing in most cases.
Let me explain with an example; let BM_WORD_SIZE be 2, so that we have
16 bits per word.
Suppose that the size of vector is N, with k = N % 16 > 0, that is, up
to now max(tidnum) == N.
Then if we (non-HOT) update a row, or if we insert a row, then that
row will have tidnum = N' > N, so that the corresponding vector will
need to be "enlarged" by N'-N bits.
In the case
N' % 16 == N % 16
then we will have to update the last word, which will be locate in the
LOVItem in last_word, without scanning BMV pages.
Another case which is dealt at LOVItem level is when
N' % 16 > N % 16
and 
last_compword is compressed
and
last_word can merge with last_compword in a single word
(e.g. if last_compword is the compressed word representing M 1's, and
if last_word is a word of 1's, then they will merge in the single
compressed word representing M+1 1's).
I agree that this is more complicated than the abstract model of
bitmap indexes; we kept this design, which came from the author of the
previous version of the patch, because it seems likely to be an useful
optimization.
Thank you for your comments (also for the useful remarks on the
catalog);
best regards,
Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it