Re: Bitmap indexes - Mailing list pgsql-performance

From Daniel Ceregatti
Subject Re: Bitmap indexes
Date
Msg-id 42015FCD.9090200@omnis.com
Whole thread Raw
In response to Re: Bitmap indexes  (PFC <lists@boutiquenumerique.com>)
List pgsql-performance
PFC wrote:

>
>     contrib/intarray has an index type which could be what you need.
>

I've used intarray for a site that requires that I match multiple low
cardinality attributes with multiple search criteria. Here's an
(abridged) example:

The table:

\d person_attributes
                  Table "dm.person_attributes"
     Column     |           Type           |     Modifiers
----------------+--------------------------+--------------------
 attributes     | integer[]                | not null
 personid       | integer                  | not null
Indexes:
    "person_attributes_pk" PRIMARY KEY, btree (personid)
    "person_attributes_gist_attributes_index" gist (attributes)

This table has about 1.1 million rows.

The index:

create index person_attributes_gist_attributes_index on
person_attributes using gist ((attributes) gist__int_ops);

The query:

select personid
from person_attributes
where attributes @@
'(1|3)&(900)&(902)&(1002)&(9002)&(11003)&(12002|12003)&(13003|13004|13005|13006|13007|13008|13009|13010)'::query_int

The explain analyze:

Index Scan using person_attributes_gist_search_index on
person_attributes pa  (cost=0.00..1221.26 rows=602 width=4) (actual
time=0.725..628.994 rows=1659 loops=1)
  Index Cond: (search @@ '( 1 | 3 ) & 900 & 902 & 1002 & 9002 & 11003 &
( 12002 | 12003 ) & ( ( ( ( ( ( ( 13003 | 13004 ) | 13005 ) | 13006 ) |
13007 ) | 13008 ) | 13009 ) | 13010 )'::query_int)
Total runtime: 431.843 ms

The query_int and what each number means:

1|3 means, only gather the people in site id 1 or 3.
900 is an arbitrary flag that means they are searchable.
902 is another arbitrary flag that means they have photos.
1002 is the flag for "don't drink".
9002 is the flag for "don't smoke".
11003 is the flag for "female".
12002|12003 are the flags for straight|bisexual.
13003 through 13010 represent the age range 18 through 25.

In plain English: select all females who are straight or bisexual,
between the ages of 18 and 25 inclusive, that don't drink, that don't
smoke, who are searchable, who have photos, and belong to sites 1 or 3.

As you can see by the explain, this query is relatively fast, given the
number of criteria and data that has to be searched.

This site's predecessor used oracle, and I used bitmap indexes for
performing these searches in oracle. This intarray method is the closest
I've come to being able to reproduce the same functionality at the
required speed in postgres.

The only problems I've run into with this method are: the non-concurrent
nature of gist indexes, which makes doing any sort of bulk DML on them
extremely time consuming (I usually have to drop the index, perform the
bulk DML, then re-create the index), dealing with intarray methods to
select particular attributes so I can then order by them, and dealing
with intarray methods for updating the attributes column. All of these
methods are detailed in the intarray README.

I'm happy with the performance in production so far. I've yet to see any
gist concurrency issues affect performance with normal rates of DML.

Daniel

--

Daniel Ceregatti - Programmer
Omnis Network, LLC

Real Programmers don't eat quiche.  They eat Twinkies and Szechwan food.



pgsql-performance by date:

Previous
From: Cosimo Streppone
Date:
Subject: Re: High end server and storage for a PostgreSQL OLTP system
Next
From: Tom Lane
Date:
Subject: Re: Bad query optimizer misestimation because of TOAST