Thread: performance index scan vs bitmap-seq scan.

performance index scan vs bitmap-seq scan.

From

"S Golly"

Date:

20 December 2007, 21:07:07

The server is running 8.2.5 FreeBSD 6.1 with 3 GB of RAM.
I have a table with over 100M rows. I have a unique index (primary key) on column name called aid.
The select count(aid) .. does a Bitmap heap scan when the right side condition is above 100,000,000 (if i take one zero off it does a pure index scan).
My question : why is the optimizer choosing an Bitmap Heap Scan when count can be done with index.

When i set the bitmap scan to off, it selects an seq scan, after which i turn the seq scan off , then it does index scan only but it takes 8 minutes longer than doing a index scan+bitmap scan.

Any insight is appreciated.
thank you !
g

explain select count(aid) from topcat.aid where aid >= 10000000 and aid <= 100000000;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
Aggregate (cost=2786143.42..2786143.43 rows=1 width=4)
   -> Bitmap Heap Scan on aid (cost=142949.13..2742507.95 rows=17454188 width=4)
         Recheck Cond: ((aid >= 10000000) AND (aid <= 100000000))
         -> Bitmap Index Scan on idx_aid_aid (cost=0.00..142949.13 rows=17454188 width=0)
               Index Cond: ((aid >= 10000000) AND (aid <= 100000000))
(5 rows)

explain select count(aid) from topcat.aid where aid >= 100000 and aid <= 100000000;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
Aggregate (cost=2786143.42..2786143.43 rows=1 width=4)
   -> Bitmap Heap Scan on aid (cost=142949.13..2742507.95 rows=17454188 width=4)
         Recheck Cond: ((aid >= 100000) AND (aid <= 100000000))
         -> Bitmap Index Scan on idx_aid_aid (cost=0.00..142949.13 rows=17454188 width=0)
               Index Cond: ((aid >= 100000) AND (aid <= 100000000))
(5 rows)

explain select count(aid) from topcat.aid where aid >= 1000000 and aid <= 100000000;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
Aggregate (cost= 2786143.42..2786143.43 rows=1 width=4)
   -> Bitmap Heap Scan on aid (cost=142949.13..2742507.95 rows=17454188 width=4)
         Recheck Cond: ((aid >= 1000000) AND (aid <= 100000000))
         -> Bitmap Index Scan on idx_aid_aid (cost= 0.00..142949.13 rows=17454188 width=0)
               Index Cond: ((aid >= 1000000) AND (aid <= 100000000))
(5 rows)

explain select count(aid) from topcat.aid where aid >= 1000000 and aid <= 10000000;
                                 QUERY PLAN
-----------------------------------------------------------------------------
Aggregate (cost=5.58..5.59 rows=1 width=4)
   -> Index Scan using idx_aid_aid on aid (cost= 0.00..5.58 rows=1 width=4)
         Index Cond: ((aid >= 1000000) AND (aid <= 10000000))

Re: performance index scan vs bitmap-seq scan.

From

"Scott Marlowe"

Date:

20 December 2007, 21:16:18

On Dec 20, 2007 4:06 PM, S Golly <gollykai@gmail.com> wrote:
> The server is running 8.2.5 FreeBSD 6.1 with 3 GB of RAM.
> I have a table with over 100M rows. I have a unique index (primary key) on
> column name called aid.
> The select count(aid) .. does a Bitmap heap scan when the right side
> condition is above 100,000,000 (if i take one zero off it does a pure index
> scan).
> My question : why is the optimizer choosing an Bitmap Heap Scan when count
> can be done with index.

Because count can't be done with the index alone.

In pgsql the visibility info is in the table itself, so even if it can
hit the index, it has to go back and hit the table to be sure if the
tuple is visible.