Thread: Bitmap indexes

Bitmap indexes

From
Alex Turner
Date:
I was wondering about index types.  Oracle has an index type called a
'bitmap' index.  They describe this as an index for low cardinality
fields, where only the cardinal values are indexed in a b-tree, and
then it uses a bitmap below that to describe rows.  They say that this
type of index is very fast when combined with queries that used the
indexed row in 'AND' clauses in a sql statement as the index can
'mask' the results very fast.  I have not been able to benchmark the
actual effectiveness of this kind of index, but I was wondering if
anyone has had experience with this an believes it might be a useful
feature for postgres?

Yes I have a vested interest in this because alot of my searches are
masked against low cardinality fields 'Y' or 'N' type things where
this could potentialy benefit me...

Alex Turner
NetEconomist

Re: Bitmap indexes

From
PFC
Date:
    contrib/intarray has an index type which could be what you need.


> I was wondering about index types.  Oracle has an index type called a
> 'bitmap' index.  They describe this as an index for low cardinality
> fields, where only the cardinal values are indexed in a b-tree, and
> then it uses a bitmap below that to describe rows.  They say that this
> type of index is very fast when combined with queries that used the
> indexed row in 'AND' clauses in a sql statement as the index can
> 'mask' the results very fast.  I have not been able to benchmark the
> actual effectiveness of this kind of index, but I was wondering if
> anyone has had experience with this an believes it might be a useful
> feature for postgres?
>
> Yes I have a vested interest in this because alot of my searches are
> masked against low cardinality fields 'Y' or 'N' type things where
> this could potentialy benefit me...
>
> Alex Turner
> NetEconomist
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>



Re: Bitmap indexes

From
Tom Lane
Date:
Alex Turner <armtuk@gmail.com> writes:
> I was wondering about index types.  Oracle has an index type called a
> 'bitmap' index.

There's a great deal about this in the list archives (probably more in
pgsql-hackers than in -performance).  Most of the current interest has
to do with building in-memory bitmaps on the fly, as a way of decoupling
index and heap scan processing.  Which is not quite what you're talking
about but should be pretty effective for low-cardinality cases.  In
particular it'd allow AND and OR combination of multiple indexes, which
we do poorly or not at all at the moment.

            regards, tom lane

Re: Bitmap indexes

From
PFC
Date:
> There's a great deal about this in the list archives (probably more in
> pgsql-hackers than in -performance).  Most of the current interest has
> to do with building in-memory bitmaps on the fly, as a way of decoupling
> index and heap scan processing.  Which is not quite what you're talking
> about but should be pretty effective for low-cardinality cases.  In
> particular it'd allow AND and OR combination of multiple indexes, which
> we do poorly or not at all at the moment.

    Is this called a star join ?

    It would also allow to access the data pages in a more sequential order
if the rows are not required to be retrieved in index order, which would
potentially be a large speedup for index scans concerning more than the
usual very small percentage of rows in a table : if several rows to be
retrieved are on the same page, it would visit this page only once.

Re: Bitmap indexes

From
Christopher Browne
Date:
armtuk@gmail.com (Alex Turner) writes:

> I was wondering about index types.  Oracle has an index type called a
> 'bitmap' index.  They describe this as an index for low cardinality
> fields, where only the cardinal values are indexed in a b-tree, and
> then it uses a bitmap below that to describe rows.  They say that this
> type of index is very fast when combined with queries that used the
> indexed row in 'AND' clauses in a sql statement as the index can
> 'mask' the results very fast.  I have not been able to benchmark the
> actual effectiveness of this kind of index, but I was wondering if
> anyone has had experience with this an believes it might be a useful
> feature for postgres?
>
> Yes I have a vested interest in this because alot of my searches are
> masked against low cardinality fields 'Y' or 'N' type things where
> this could potentialy benefit me...

There are some ideas on this; nothing likely to be implemented in the
very short term.

If you do a lot of queries on this sort of basis, there's something in
PostgreSQL known as a "partial index" that could be used to improve
some queries.

What you might do is something like:

 create index partial_y_for_field_a on some_table (id_column)
   where field_a = 'Y';
 create index partial_n_for_field_a on some_table (id_column)
   where field_a = 'N';

That could provide speedup for queries that might do joins on
id_column where your query has the qualifiers "where field_a = 'Y'" or
"where field_a = 'N'".

That's not going to provide a generalized answer to "star queries,"
but it is an immediate answer for some cases.
--
"cbbrowne","@","ca.afilias.info"
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)

Re: Bitmap indexes

From
Bruce Momjian
Date:
PFC wrote:
> > There's a great deal about this in the list archives (probably more in
> > pgsql-hackers than in -performance).  Most of the current interest has
> > to do with building in-memory bitmaps on the fly, as a way of decoupling
> > index and heap scan processing.  Which is not quite what you're talking
> > about but should be pretty effective for low-cardinality cases.  In
> > particular it'd allow AND and OR combination of multiple indexes, which
> > we do poorly or not at all at the moment.
>
>     Is this called a star join ?
>
>     It would also allow to access the data pages in a more sequential order
> if the rows are not required to be retrieved in index order, which would
> potentially be a large speedup for index scans concerning more than the
> usual very small percentage of rows in a table : if several rows to be
> retrieved are on the same page, it would visit this page only once.

Please see the TODO list for a summary of previous discussions and
directions.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Bitmap indexes

From
Daniel Ceregatti
Date:
PFC wrote:

>
>     contrib/intarray has an index type which could be what you need.
>

I've used intarray for a site that requires that I match multiple low
cardinality attributes with multiple search criteria. Here's an
(abridged) example:

The table:

\d person_attributes
                  Table "dm.person_attributes"
     Column     |           Type           |     Modifiers
----------------+--------------------------+--------------------
 attributes     | integer[]                | not null
 personid       | integer                  | not null
Indexes:
    "person_attributes_pk" PRIMARY KEY, btree (personid)
    "person_attributes_gist_attributes_index" gist (attributes)

This table has about 1.1 million rows.

The index:

create index person_attributes_gist_attributes_index on
person_attributes using gist ((attributes) gist__int_ops);

The query:

select personid
from person_attributes
where attributes @@
'(1|3)&(900)&(902)&(1002)&(9002)&(11003)&(12002|12003)&(13003|13004|13005|13006|13007|13008|13009|13010)'::query_int

The explain analyze:

Index Scan using person_attributes_gist_search_index on
person_attributes pa  (cost=0.00..1221.26 rows=602 width=4) (actual
time=0.725..628.994 rows=1659 loops=1)
  Index Cond: (search @@ '( 1 | 3 ) & 900 & 902 & 1002 & 9002 & 11003 &
( 12002 | 12003 ) & ( ( ( ( ( ( ( 13003 | 13004 ) | 13005 ) | 13006 ) |
13007 ) | 13008 ) | 13009 ) | 13010 )'::query_int)
Total runtime: 431.843 ms

The query_int and what each number means:

1|3 means, only gather the people in site id 1 or 3.
900 is an arbitrary flag that means they are searchable.
902 is another arbitrary flag that means they have photos.
1002 is the flag for "don't drink".
9002 is the flag for "don't smoke".
11003 is the flag for "female".
12002|12003 are the flags for straight|bisexual.
13003 through 13010 represent the age range 18 through 25.

In plain English: select all females who are straight or bisexual,
between the ages of 18 and 25 inclusive, that don't drink, that don't
smoke, who are searchable, who have photos, and belong to sites 1 or 3.

As you can see by the explain, this query is relatively fast, given the
number of criteria and data that has to be searched.

This site's predecessor used oracle, and I used bitmap indexes for
performing these searches in oracle. This intarray method is the closest
I've come to being able to reproduce the same functionality at the
required speed in postgres.

The only problems I've run into with this method are: the non-concurrent
nature of gist indexes, which makes doing any sort of bulk DML on them
extremely time consuming (I usually have to drop the index, perform the
bulk DML, then re-create the index), dealing with intarray methods to
select particular attributes so I can then order by them, and dealing
with intarray methods for updating the attributes column. All of these
methods are detailed in the intarray README.

I'm happy with the performance in production so far. I've yet to see any
gist concurrency issues affect performance with normal rates of DML.

Daniel

--

Daniel Ceregatti - Programmer
Omnis Network, LLC

Real Programmers don't eat quiche.  They eat Twinkies and Szechwan food.