Thread: Forcing more agressive index scans for BITMAP AND

Forcing more agressive index scans for BITMAP AND

From
Ow Mun Heng
Date:
just wondering if there's a special tweak i can do to force more usage
of indexes to do BITMAP ands?

I have a table like

A int
B int
C int
D int
E int
F int
g int

where A/B/C/D/E are indexes

There's ~20millions rows in the table.

Query are something like this.

select * from table
where A=X
and B = Y
and C = Z
and D = AA
and E = BB

the query plan will only pick 2 indexes to do the bitmap.
I'm not sure how to tweak the config for it to use more indexes.

Box is a celeron 1.7 w/ 768MB ram with shared buffers at 250MB and
effective cache size 350MB



Re: Forcing more agressive index scans for BITMAP AND

From
Matthew
Date:
On Fri, 4 Apr 2008, Ow Mun Heng wrote:
> select * from table
> where A=X
> and B = Y
> and C = Z
> and D = AA
> and E = BB

This may not be the answer you're looking for, but if you create a
multi-coloumn index, it should be able to make your query run fast:

CREATE INDEX foo ON table (A, B, C, D, E);

It'll certainly be faster than building a bitmap for the contents of five
separate indexes.

Matthew

--
-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-.
||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||
|/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/
'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'

Re: Forcing more agressive index scans for BITMAP AND

From
PFC
Date:
> On Fri, 4 Apr 2008, Ow Mun Heng wrote:
>> select * from table
>> where A=X
>> and B = Y
>> and C = Z
>> and D = AA
>> and E = BB

    With that kind of WHERE condition, Postgres will use a Bitmap Index Scan
to combine your indices. If, however, postgres notices while looking at
the statistics gathered during ANALYZE, that for one of your columns, you
request a value that happens in a large percentage of the rows (like 20%),
and this value has a rather random distribution, Postgres will not bother
scanning the index, because it is very likely that all the pages would
contain a row satisfying your condition anyway, so the time taken to scan
this huge index and mark the bitmap would be lost because it would not
allow a better selectivity, since all the pages would get selected for
scan anyway.
    I would guess that Postgres uses Bitmap Index Scan only on your columns
that have good selectivity (ie. lots of different values).

    So :

    If you use conditions on (a,b) or (a,b,c) or (a,b,c,d) etc, you will
benefit GREATLY from a multicolumn index on (a,b,c,d...).
    However, even if postgres can use some clever tricks, a multicolumn index
on (a,b,c,d) will not be optimal for a condition on (b,c,d) for instance.

    So, if you mostly use conditions on a left-anchored subset of
(a,b,c,d,e), the multicolumn index will be a great tool.
    A multicolumn index on (a,b,c,d,e) is always slightly slower than an
index on (a) if you only use a condition on (a), but it is immensely
faster when you use a multicolumn condition.

    Can you tell us more about what those columns mean and what you store in
them, how many distinct values, etc ?