Re: Bitmap indexes - Mailing list pgsql-performance
From | Daniel Ceregatti |
---|---|
Subject | Re: Bitmap indexes |
Date | |
Msg-id | 42015FCD.9090200@omnis.com Whole thread Raw |
In response to | Re: Bitmap indexes (PFC <lists@boutiquenumerique.com>) |
List | pgsql-performance |
PFC wrote: > > contrib/intarray has an index type which could be what you need. > I've used intarray for a site that requires that I match multiple low cardinality attributes with multiple search criteria. Here's an (abridged) example: The table: \d person_attributes Table "dm.person_attributes" Column | Type | Modifiers ----------------+--------------------------+-------------------- attributes | integer[] | not null personid | integer | not null Indexes: "person_attributes_pk" PRIMARY KEY, btree (personid) "person_attributes_gist_attributes_index" gist (attributes) This table has about 1.1 million rows. The index: create index person_attributes_gist_attributes_index on person_attributes using gist ((attributes) gist__int_ops); The query: select personid from person_attributes where attributes @@ '(1|3)&(900)&(902)&(1002)&(9002)&(11003)&(12002|12003)&(13003|13004|13005|13006|13007|13008|13009|13010)'::query_int The explain analyze: Index Scan using person_attributes_gist_search_index on person_attributes pa (cost=0.00..1221.26 rows=602 width=4) (actual time=0.725..628.994 rows=1659 loops=1) Index Cond: (search @@ '( 1 | 3 ) & 900 & 902 & 1002 & 9002 & 11003 & ( 12002 | 12003 ) & ( ( ( ( ( ( ( 13003 | 13004 ) | 13005 ) | 13006 ) | 13007 ) | 13008 ) | 13009 ) | 13010 )'::query_int) Total runtime: 431.843 ms The query_int and what each number means: 1|3 means, only gather the people in site id 1 or 3. 900 is an arbitrary flag that means they are searchable. 902 is another arbitrary flag that means they have photos. 1002 is the flag for "don't drink". 9002 is the flag for "don't smoke". 11003 is the flag for "female". 12002|12003 are the flags for straight|bisexual. 13003 through 13010 represent the age range 18 through 25. In plain English: select all females who are straight or bisexual, between the ages of 18 and 25 inclusive, that don't drink, that don't smoke, who are searchable, who have photos, and belong to sites 1 or 3. As you can see by the explain, this query is relatively fast, given the number of criteria and data that has to be searched. This site's predecessor used oracle, and I used bitmap indexes for performing these searches in oracle. This intarray method is the closest I've come to being able to reproduce the same functionality at the required speed in postgres. The only problems I've run into with this method are: the non-concurrent nature of gist indexes, which makes doing any sort of bulk DML on them extremely time consuming (I usually have to drop the index, perform the bulk DML, then re-create the index), dealing with intarray methods to select particular attributes so I can then order by them, and dealing with intarray methods for updating the attributes column. All of these methods are detailed in the intarray README. I'm happy with the performance in production so far. I've yet to see any gist concurrency issues affect performance with normal rates of DML. Daniel -- Daniel Ceregatti - Programmer Omnis Network, LLC Real Programmers don't eat quiche. They eat Twinkies and Szechwan food.
pgsql-performance by date: