Re: bitmap AM design - Mailing list pgsql-hackers
From | Victor Y. Yegorov |
---|---|
Subject | Re: bitmap AM design |
Date | |
Msg-id | 20050304230304.GA6276@mits.lv Whole thread Raw |
In response to | bitmap AM design ("Victor Y. Yegorov" <viy@mits.lv>) |
List | pgsql-hackers |
Some more thoughts and questions. The main idea above bitmaps is narrowing some data sets' possible values to "yes" and "no" (i.e. 1 and 0) and then organize the data in the series of bits, where bit's position determines values to consider. In the cases, where several indexed attributes are used in WHERE clause, it's possible to do logical AND/OR on bitmaps before returning any results to the caller. For large tables with high number of low-cardinality attributes using bitmaps can result in certain speed-up. For on-disk bitmaps I'm working on, each value of each indexed attribute has it's own bitmap (i.e. series of bits, with bits set to 1 for rows with corresponding fields having value of that bitmap). Scanning the bitmap, we end up with an array of "1-bits' positions" and need to convert those positions to CTIDs, as executor is expecting. So, index should also keep a CTID table, that corresponds to the bitmap's data. This CTID table will be the same for all bitmap indexes, created for one table, thus having 2 bitmap indexes will mean you're wasting some amount of disk space, storing absolutely identical data. So, to save space, we have 2 possibilities: 1) create a new relkind for the CTID table (maybe used not only for on-disk bitmaps); 2) make all "create index ... using bitmap" statements actually create/extend existing bitmap index. This also implies,that planner/executor should try using multicolumn bitmap index when at least one indexed field is present in theWHERE clause. I'm working on the 2nd case, because 1st one requires more work not only in the access method + executor + planner area. It is also possible to keep things as is and make a note in the documentation, that it is better to have 1 multicolumn bitmap index, then several single column ones, and that planner will still use multicolumn index even if not all columns are involved. Any comments/ideas here? After implementing bitmap index access method, it'll be necessary to teach planner and executor to use multicolumn bitmaps for any number of scan-attributes. Also, I cannot say in what circumstances planner should prefer bitmap scan to seqscan; I thought of cases, when it estimates return set being about 60% of the relation. What community has to say here? Also, as Tom is planning to work on in-memory bitmaps (maybe something is done already, don't know), I thought that it can be possible to cooperate. As I have no clue at the moment how in-memory bitmaps are going to work, is it possible to hear from you some draft of the forthcoming implementation? And what prerequisites would be required to join both bitmaps somehow? Waiting for your thoughts/comments. -- Victor Y. Yegorov
pgsql-hackers by date: