Thread: Bitmap indexes
I was wondering about index types. Oracle has an index type called a 'bitmap' index. They describe this as an index for low cardinality fields, where only the cardinal values are indexed in a b-tree, and then it uses a bitmap below that to describe rows. They say that this type of index is very fast when combined with queries that used the indexed row in 'AND' clauses in a sql statement as the index can 'mask' the results very fast. I have not been able to benchmark the actual effectiveness of this kind of index, but I was wondering if anyone has had experience with this an believes it might be a useful feature for postgres? Yes I have a vested interest in this because alot of my searches are masked against low cardinality fields 'Y' or 'N' type things where this could potentialy benefit me... Alex Turner NetEconomist
contrib/intarray has an index type which could be what you need. > I was wondering about index types. Oracle has an index type called a > 'bitmap' index. They describe this as an index for low cardinality > fields, where only the cardinal values are indexed in a b-tree, and > then it uses a bitmap below that to describe rows. They say that this > type of index is very fast when combined with queries that used the > indexed row in 'AND' clauses in a sql statement as the index can > 'mask' the results very fast. I have not been able to benchmark the > actual effectiveness of this kind of index, but I was wondering if > anyone has had experience with this an believes it might be a useful > feature for postgres? > > Yes I have a vested interest in this because alot of my searches are > masked against low cardinality fields 'Y' or 'N' type things where > this could potentialy benefit me... > > Alex Turner > NetEconomist > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Alex Turner <armtuk@gmail.com> writes: > I was wondering about index types. Oracle has an index type called a > 'bitmap' index. There's a great deal about this in the list archives (probably more in pgsql-hackers than in -performance). Most of the current interest has to do with building in-memory bitmaps on the fly, as a way of decoupling index and heap scan processing. Which is not quite what you're talking about but should be pretty effective for low-cardinality cases. In particular it'd allow AND and OR combination of multiple indexes, which we do poorly or not at all at the moment. regards, tom lane
> There's a great deal about this in the list archives (probably more in > pgsql-hackers than in -performance). Most of the current interest has > to do with building in-memory bitmaps on the fly, as a way of decoupling > index and heap scan processing. Which is not quite what you're talking > about but should be pretty effective for low-cardinality cases. In > particular it'd allow AND and OR combination of multiple indexes, which > we do poorly or not at all at the moment. Is this called a star join ? It would also allow to access the data pages in a more sequential order if the rows are not required to be retrieved in index order, which would potentially be a large speedup for index scans concerning more than the usual very small percentage of rows in a table : if several rows to be retrieved are on the same page, it would visit this page only once.
armtuk@gmail.com (Alex Turner) writes: > I was wondering about index types. Oracle has an index type called a > 'bitmap' index. They describe this as an index for low cardinality > fields, where only the cardinal values are indexed in a b-tree, and > then it uses a bitmap below that to describe rows. They say that this > type of index is very fast when combined with queries that used the > indexed row in 'AND' clauses in a sql statement as the index can > 'mask' the results very fast. I have not been able to benchmark the > actual effectiveness of this kind of index, but I was wondering if > anyone has had experience with this an believes it might be a useful > feature for postgres? > > Yes I have a vested interest in this because alot of my searches are > masked against low cardinality fields 'Y' or 'N' type things where > this could potentialy benefit me... There are some ideas on this; nothing likely to be implemented in the very short term. If you do a lot of queries on this sort of basis, there's something in PostgreSQL known as a "partial index" that could be used to improve some queries. What you might do is something like: create index partial_y_for_field_a on some_table (id_column) where field_a = 'Y'; create index partial_n_for_field_a on some_table (id_column) where field_a = 'N'; That could provide speedup for queries that might do joins on id_column where your query has the qualifiers "where field_a = 'Y'" or "where field_a = 'N'". That's not going to provide a generalized answer to "star queries," but it is an immediate answer for some cases. -- "cbbrowne","@","ca.afilias.info" <http://dev6.int.libertyrms.com/> Christopher Browne (416) 673-4124 (land)
PFC wrote: > > There's a great deal about this in the list archives (probably more in > > pgsql-hackers than in -performance). Most of the current interest has > > to do with building in-memory bitmaps on the fly, as a way of decoupling > > index and heap scan processing. Which is not quite what you're talking > > about but should be pretty effective for low-cardinality cases. In > > particular it'd allow AND and OR combination of multiple indexes, which > > we do poorly or not at all at the moment. > > Is this called a star join ? > > It would also allow to access the data pages in a more sequential order > if the rows are not required to be retrieved in index order, which would > potentially be a large speedup for index scans concerning more than the > usual very small percentage of rows in a table : if several rows to be > retrieved are on the same page, it would visit this page only once. Please see the TODO list for a summary of previous discussions and directions. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
PFC wrote: > > contrib/intarray has an index type which could be what you need. > I've used intarray for a site that requires that I match multiple low cardinality attributes with multiple search criteria. Here's an (abridged) example: The table: \d person_attributes Table "dm.person_attributes" Column | Type | Modifiers ----------------+--------------------------+-------------------- attributes | integer[] | not null personid | integer | not null Indexes: "person_attributes_pk" PRIMARY KEY, btree (personid) "person_attributes_gist_attributes_index" gist (attributes) This table has about 1.1 million rows. The index: create index person_attributes_gist_attributes_index on person_attributes using gist ((attributes) gist__int_ops); The query: select personid from person_attributes where attributes @@ '(1|3)&(900)&(902)&(1002)&(9002)&(11003)&(12002|12003)&(13003|13004|13005|13006|13007|13008|13009|13010)'::query_int The explain analyze: Index Scan using person_attributes_gist_search_index on person_attributes pa (cost=0.00..1221.26 rows=602 width=4) (actual time=0.725..628.994 rows=1659 loops=1) Index Cond: (search @@ '( 1 | 3 ) & 900 & 902 & 1002 & 9002 & 11003 & ( 12002 | 12003 ) & ( ( ( ( ( ( ( 13003 | 13004 ) | 13005 ) | 13006 ) | 13007 ) | 13008 ) | 13009 ) | 13010 )'::query_int) Total runtime: 431.843 ms The query_int and what each number means: 1|3 means, only gather the people in site id 1 or 3. 900 is an arbitrary flag that means they are searchable. 902 is another arbitrary flag that means they have photos. 1002 is the flag for "don't drink". 9002 is the flag for "don't smoke". 11003 is the flag for "female". 12002|12003 are the flags for straight|bisexual. 13003 through 13010 represent the age range 18 through 25. In plain English: select all females who are straight or bisexual, between the ages of 18 and 25 inclusive, that don't drink, that don't smoke, who are searchable, who have photos, and belong to sites 1 or 3. As you can see by the explain, this query is relatively fast, given the number of criteria and data that has to be searched. This site's predecessor used oracle, and I used bitmap indexes for performing these searches in oracle. This intarray method is the closest I've come to being able to reproduce the same functionality at the required speed in postgres. The only problems I've run into with this method are: the non-concurrent nature of gist indexes, which makes doing any sort of bulk DML on them extremely time consuming (I usually have to drop the index, perform the bulk DML, then re-create the index), dealing with intarray methods to select particular attributes so I can then order by them, and dealing with intarray methods for updating the attributes column. All of these methods are detailed in the intarray README. I'm happy with the performance in production so far. I've yet to see any gist concurrency issues affect performance with normal rates of DML. Daniel -- Daniel Ceregatti - Programmer Omnis Network, LLC Real Programmers don't eat quiche. They eat Twinkies and Szechwan food.