Thread: Bitmap index - first look

Bitmap index - first look

From
Teodor Sigaev
Date:
http://archives.postgresql.org/message-id/20081101000154.GO27872@fune

1) Sometimes index doesn't find all matching rows:
postgres=# SELECT * FROM qq WHERE t ='asd'; i |  t
---+----- 2 | asd 1 | asd 2 | asd
(3 rows)
postgres=# SET enable_seqscan=off;
SET
postgres=# SELECT * FROM qq WHERE t ='asd'; i |  t
---+----- 2 | asd
(1 row)

How to reproduce:
DROP TABLE IF EXISTS qq;
CREATE TABLE qq ( i int, t text );
INSERT INTO qq VALUES (1, 'qwe');
INSERT INTO qq VALUES (2, 'asd');
CREATE INDEX qqidx ON qq USING bitmap (i,t);
INSERT INTO qq VALUES (1, 'asd');
INSERT INTO qq VALUES (2, 'asd');
SELECT * FROM qq;
SELECT * FROM qq WHERE t ='asd';
SET enable_seqscan=off;
SELECT * FROM qq WHERE t ='asd';

2) Why is pg_am.amstrategies set to 5 while index supports only equal operation?

3) Typo in bmbulkdelete:    /* allocate stats if first time through, else re-use existing struct */    if (result ==
NULL)         result = (IndexBulkDeleteResult *)                  palloc0(sizeof(IndexBulkDeleteResult));
 
    result = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
    'result' is allocated twice.

4) Bitmap index is marked with pg_am.amcanorder = 'f', so you don't need to 
support ammarkpos/amrestrpos - see
http://archives.postgresql.org/pgsql-hackers/2008-10/msg00862.php




-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: Bitmap index - first look

From
"Vladimir Sitnikov"
Date:
One more point on pg_am:  amsearchnull is equal to "f"  however the index stores and could find nulls perfectly.<br
/><br/>Regards,<br />Vladimir Sitnikov<br /> 

Re: Bitmap index - first look

From
Teodor Sigaev
Date:
After looking at additional heap and b-tree index placed in pg_bitmapindex 
namespace...

Additional heap contains unique values and page's number with offset number in 
bitmap index, b-tree index contains tuples with the same values and ItemPointer 
to heap's row. So, heap is an unnecessary step - b-tree index should store 
ItemPointer to the bitmap index directly.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: Bitmap index - first look

From
Gianni Ciolli
Date:
On Fri, Nov 07, 2008 at 03:56:25PM +0300, Teodor Sigaev wrote:
> After looking at additional heap and b-tree index placed in 
> pg_bitmapindex namespace...
>
> Additional heap contains unique values and page's number with offset 
> number in bitmap index, b-tree index contains tuples with the same values 
> and ItemPointer to heap's row. So, heap is an unnecessary step - b-tree 
> index should store ItemPointer to the bitmap index directly.

Hi Teodor,

B-tree points to LOVItem (heap's row) because the LOVItem contains
some vector metadata, plus the last two words of the actual bitmap
vector, last_compword and last_word; they are stored there because
they will be changing in most cases.

Let me explain with an example; let BM_WORD_SIZE be 2, so that we have
16 bits per word.

Suppose that the size of vector is N, with k = N % 16 > 0, that is, up
to now max(tidnum) == N.

Then if we (non-HOT) update a row, or if we insert a row, then that
row will have tidnum = N' > N, so that the corresponding vector will
need to be "enlarged" by N'-N bits.

In the case

N' % 16 == N % 16

then we will have to update the last word, which will be locate in the
LOVItem in last_word, without scanning BMV pages.

Another case which is dealt at LOVItem level is when

N' % 16 > N % 16
and 
last_compword is compressed
and
last_word can merge with last_compword in a single word

(e.g. if last_compword is the compressed word representing M 1's, and
if last_word is a word of 1's, then they will merge in the single
compressed word representing M+1 1's).

I agree that this is more complicated than the abstract model of
bitmap indexes; we kept this design, which came from the author of the
previous version of the patch, because it seems likely to be an useful
optimization.

Thank you for your comments (also for the useful remarks on the
catalog);

best regards,
Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it



Re: Bitmap index - first look

From
"Vladimir Sitnikov"
Date:
Could you please put more comments on the index build procedure?<br /><br />I guess it strongly relies on the fact the
blocknumber increases as tuples are returned from "heap scan", doesn't it? However, thanks to synchronized scans the
orderof tuples could be a little bit different.<br /><br />I found no evidence of  "build index" being able to add tid
#10 after it has created the bitmap for tids 1000...2000. As far as I understand it never tries to update words that
werewritten during index creation.<br /><br />That particular  "if (_blockno != buf->hot_buffer_block) {"  in
buf_add_tid_with_filllooks to be a wrong thing for me -- I believe it is a way to often (it will try to sync the buffer
aftereach 300/8=~40 bytes since there are no more than 300 tuples on a single page)<br /> I would not flush the bitmap
everydata block, but I would keep "hot buffer" as a temporary view (uncompressed) of the bitmap being build. So as
tuplescome, we either set the bit directry in the "hot buffer" (if it covers the relevant tid range) or flush that view
tothe bitmap (merging, and splitting the bitmap where it is required) and repoint the window so it starts with block of
tidthat triggered flushing. Does that make sense?<br /><br /><br />Regards,<br />Vladimir Sitnikov<br /> 

Re: Bitmap index - first look

From
Teodor Sigaev
Date:
>> Additional heap contains unique values and page's number with offset 
>> number in bitmap index, b-tree index contains tuples with the same values 
>> and ItemPointer to heap's row. So, heap is an unnecessary step - b-tree 
>> index should store ItemPointer to the bitmap index directly.
> B-tree points to LOVItem (heap's row) because the LOVItem contains
> some vector metadata, plus the last two words of the actual bitmap
> vector, last_compword and last_word; they are stored there because
> they will be changing in most cases.


I'm talking about pg_bitmapindex.pg_bm_OID tables - they contain distinct values 
from indexed table and pair of (blockNumber, offsetNumber), which is exactly 
ItemPointerData. I suppose, that pair points somewhere in bitmap index. In other 
hand, tuple of index 'pg_bitmapindex.pg_bm_OID_idx' contains the same distinct 
values and ItemPointer pointed to row in corresponding pg_bitmapindex.pg_bm_OID 
table.

Why doesn't ItemPointer in pg_bitmapindex.pg_bm_OID_idx point into bitmap index 
file?

Actually, you don't need a pg_bitmapindex.pg_bm_OID tables at all - the same 
data could be stored in pg_bitmapindex.pg_bm_OID_idx itself. But this is not a 
strong objection, just an a optimization.

BTW, see comments near index_getnext(): * Note: caller must check scan->xs_recheck, and perform rechecking of the *
scankeys if required.  We do not do that here because we don't have * enough information to do it efficiently in the
generalcase.
 

Although it's not true for b-tree now, but it may change.



-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/