Re: bitmap AM design - Mailing list pgsql-hackers

From Victor Y. Yegorov
Subject Re: bitmap AM design
Date
Msg-id 20050304230304.GA6276@mits.lv
Whole thread Raw
In response to bitmap AM design  ("Victor Y. Yegorov" <viy@mits.lv>)
List pgsql-hackers
Some more thoughts and questions.

The main idea above bitmaps is narrowing some data sets' possible values to
"yes" and "no" (i.e. 1 and 0) and then organize the data in the series of
bits, where bit's position determines values to consider. In the cases, where
several indexed attributes are used in WHERE clause, it's possible to do
logical AND/OR on bitmaps before returning any results to the caller. For
large tables with high number of low-cardinality attributes using bitmaps can
result in certain speed-up.

For on-disk bitmaps I'm working on, each value of each indexed attribute has
it's own bitmap (i.e. series of bits, with bits set to 1 for rows with
corresponding fields having value of that bitmap). Scanning the bitmap, we end
up with an array of "1-bits' positions" and need to convert those positions to
CTIDs, as executor is expecting. So, index should also keep a CTID table, that
corresponds to the bitmap's data. This CTID table will be the same for all
bitmap indexes, created for one table, thus having 2 bitmap indexes will mean
you're wasting some amount of disk space, storing absolutely identical data.
So, to save space, we have 2 possibilities:
1) create a new relkind for the CTID table (maybe used not only for on-disk  bitmaps);
2) make all "create index ... using bitmap" statements actually create/extend  existing bitmap index. This also
implies,that planner/executor should  try using multicolumn bitmap index when at least one indexed field is  present in
theWHERE clause.
 

I'm working on the 2nd case, because 1st one requires more work not only in
the access method + executor + planner area. It is also possible to keep
things as is and make a note in the documentation, that it is better to have 1
multicolumn bitmap index, then several single column ones, and that planner
will still use multicolumn index even if not all columns are involved.

Any comments/ideas here?


After implementing bitmap index access method, it'll be necessary to teach
planner and executor to use multicolumn bitmaps for any number of
scan-attributes. Also, I cannot say in what circumstances planner should
prefer bitmap scan to seqscan; I thought of cases, when it estimates return
set being about 60% of the relation. What community has to say here?


Also, as Tom is planning to work on in-memory bitmaps (maybe something is
done already, don't know), I thought that it can be possible to cooperate.
As I have no clue at the moment how in-memory bitmaps are going to work, is it
possible to hear from you some draft of the forthcoming implementation?
And what prerequisites would be required to join both bitmaps somehow?

Waiting for your thoughts/comments.


-- 

Victor Y. Yegorov


pgsql-hackers by date:

Previous
From: Christopher Browne
Date:
Subject: Re: hi all
Next
From: Greg Stark
Date:
Subject: Re: 8.0.X and the ARC patent