Thread: Using index for bitwise operations?

Using index for bitwise operations?

From
Shaul Dar
Date:
Hi,

I have at column that is a bit array of 16, each bit specifying if a certain property, out of 16, is present or not. Our typical query select 300 "random" rows (could be located in different blocks) from the table based on another column+index, and then filters them down to ~50 based on this the bit field. Currently we have 16 separate indexes built on each bit, and on our 25M rows table each index takes about 880MB for a total of 14GB! I would have liked to change this into a single short integer value with a single index, but I don't know if there is a way to search if specific bits are set, using a single index?  W/o an index this might be overly expensive, even as a filter (on selected 300 rows).

(I also saw the thread http://archives.postgresql.org/pgsql-performance/2007-09/msg00283.php. As I said we are currently using the same multiple index "solution" described in http://archives.postgresql.org/pgsql-performance/2007-09/msg00283.php). Any suggestions?

Thanks!

-- Shaul (Email: info@shauldar.com)

Re: Using index for bitwise operations?

From
Richard Huxton
Date:
Shaul Dar wrote:
> Hi,
>
> I have at column that is a bit array of 16, each bit specifying if a certain
> property, out of 16, is present or not. Our typical query select 300
> "random" rows (could be located in different blocks) from the table based on
> another column+index, and then filters them down to ~50 based on this the
> bit field.
[snip]
 > W/o an index this might be overly expensive,
 > even as a filter (on selected 300 rows).

Have you _tried_ just not having an index at all? Since you are only
accessing a relatively small number of rows to start with, even an
infinitely efficient index isn't going to make that much difference.
Combine that with the fact that you're going to have the indexes
competing with the table for cache space and I'd see how much difference
it makes just not having it.

Failing that, perhaps have an index on a single bit if there is one you
always/mostly check against.

The relational way to do this would be one or more property tables
joined to your main table. If the majority of your properties are not
set then this could be faster too.

--
   Richard Huxton
   Archonet Ltd

Re: Using index for bitwise operations?

From
Tom Lane
Date:
Shaul Dar <shauldar@gmail.com> writes:
> I have at column that is a bit array of 16, each bit specifying if a certain
> property, out of 16, is present or not. Our typical query select 300
> "random" rows (could be located in different blocks) from the table based on
> another column+index, and then filters them down to ~50 based on this the
> bit field. Currently we have 16 separate indexes built on each bit, and on
> our 25M rows table each index takes about 880MB for a total of 14GB!

Ouch.  One possibility is to replace the bitarray with an integer array
(k is in the int[] array iff bit k was set in the bitarray) and then use
the GIST or GIN indexing capabilities of contrib/intarray.  I also seem
to recall having seen a module that provides GIST indexing for bitops
on plain integers --- have you looked on pgfoundry?

This isn't necessarily better than what you're doing, as btree indexes
are a lot better optimized than GIST/GIN.  But it would be worth looking
into.

            regards, tom lane

Re: Using index for bitwise operations?

From
Matthew Wakeling
Date:
On Mon, 1 Jun 2009, Shaul Dar wrote:
> Our typical query select 300 "random" rows (could be located in different blocks) from the table based on another
> column+index, and then filters them down to ~50 based on this the bit field.

So it seems that you're already using an index to fetch 300 rows from a
big table, and then filtering that down to ~50 based on the über-complex
stuff.

That's the right way to do it. There isn't really an appropriate place to
add another index into this query plan. Filtering 300 rows is peanuts for
Postgres.

You quite probably won't get any benefit from having a bitwise index,
unless you can make a multi-column index with the existing index stuff
first and then the bitwise stuff as a second column. However, that sounds
like more effort than benefit.

If I have my analysis wrong, perhaps you could post your EXPLAIN ANALYSE
results so we can see what you mean.

Matthew

--
 What goes up must come down. Ask any system administrator.