Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order) - Mailing list pgsql-hackers

From Jeffrey W. Baker
Subject Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)
Date
Msg-id 1116393137.17217.24.camel@noodles
Whole thread Raw
In response to Re: bitmap scans, btree scans, and tid order  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)
Re: Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)
List pgsql-hackers
On Mon, 2005-05-16 at 14:35 -0400, Tom Lane wrote:
> "Jeffrey W. Baker" <jwbaker@acm.org> writes:
> > On Mon, 2005-05-16 at 09:53 -0400, Tom Lane wrote:
> >> This is a fallacy, and I think your concern is largely mistaken.  Have
> >> you experimented with the cases you are worried about?
> 
> > Perhaps I have not stated the problem clearly.  Believe me, I have
> > experimented extensively with this problem.
> 
> Sorry, perhaps I wasn't clear: have you experimented *with CVS tip*?
> The current code is certainly capable of choosing either seqscan,
> bitmap scan, or traditional index scan depending on the given query.
> For example,

[...]

> I think the work that's most needed at the moment is to test the
> bitmap-scan cost model to see if it has much to do with reality...

The cost model seems to work pretty darn well, as a matter of fact.  

This table is in-order by id, in-order by date, and randomly ordered by
"random".

scratch=# \d test    Table "public.test"Column |  Type   | Modifiers 
--------+---------+-----------id     | integer | date   | date    | random | integer | 
Indexes:   "test_pk" UNIQUE, btree (id)   "test_date_idx" btree (date)   "test_rand_idx" btree (random)

scratch=# select count(1) from test; count   
----------10000000

The size of the tables and indexes is about 1G, or double physical
memory.  I tested by selecting on the random column.  For 100 random
values, I selected the row that matches exactly, then in ranges of 1000,
10000, 100000, and 1000000.  These touch roughly 5, 50, 500, and 5000
tuples, respectively.  The test is repeated for index scan, bitmap scan,
and sequential scan.

Example query:
select count(1)          from test          where random >= 1429076987           and random < 1429076987 + 10000;
                                                         QUERY PLAN
    
 

------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=171.31..171.31 rows=1 width=0) (actual time=0.996..1.000 rows=1 loops=1)  ->  Bitmap Heap Scan on test
(cost=2.26..171.20rows=43 width=0) (actual time=0.145..0.829 rows=42 loops=1)        Recheck Cond: ((random >=
1429076987)AND (random < 1429086987))        ->  Bitmap Index Scan on test_rand_idx  (cost=0.00..2.26 rows=43 width=0)
(actualtime=0.063..0.063 rows=42 loops=1)              Index Cond: ((random >= 1429076987) AND (random <
1429086987))Totalruntime: 1.114 ms
 

100 queries returning |  1  |  5  |  50  |  500  |  5000  |
----------------------+-----+-----+------+-------+--------+
bitmap scan           |  1s |  2s |  13s | 1m41s | 11m16s |
index scan            |  1s |  1s |  12s | 2m6s  | 24m19s |
----------------------+-----+-----+------+-------+--------+
sequential scan       |             28m20s                | 

The planner uses index scan for the first two columns, and chooses
bitmap scan for the rest.  This would seem to be a reasonable decision,
given the results.  The only real problem with the cost model in further
testing is that it doesn't switch to seq scan quickly enough.  If bitmap
scan is disabled, the planner will pick index scan even in cases when
sequential scan is 10x faster:

scratch=# set enable_bitmapscan to off;
SET
scratch=# explain analyze select count(1) from test where random >= 1429076987  and  random < 1429076987 + 10000000;
                                                           QUERY PLAN
            
 

--------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=170142.03..170142.03 rows=1 width=0) (actual time=177419.182..177419.185 rows=1 loops=1)  ->  Index Scan using
test_rand_idxon test  (cost=0.00..170034.11 rows=43167 width=0) (actual time=0.035..177255.696 rows=46764 loops=1)
 Index Cond: ((random >= 1429076987) AND (random < 1439076987))Total runtime: 177419.302 ms
 
(4 rows)

scratch=# set enable_indexscan to off;
SET
scratch=# explain analyze select count(1) from test where random >= 1429076987  and  random < 1429076987 + 10000000;
                                                QUERY PLAN                                                      
 

----------------------------------------------------------------------------------------------------------------------Aggregate
(cost=204165.55..204165.55 rows=1 width=0) (actual time=12334.042..12334.045 rows=1 loops=1)  ->  Seq Scan on test
(cost=0.00..204057.62rows=43167 width=0) (actual time=17.436..12174.150 rows=46764 loops=1)        Filter: ((random >=
1429076987)AND (random < 1439076987))Total runtime: 12334.156 ms
 
(4 rows)

Obviously in this case sequential scan was (would have been) a huge win.
Incrementing random_page_cost from 4 (the default) to 5 causes the
planner to make a better decision.

-jwb




pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Cost of XLogInsert CRC calculations
Next
From: Tom Lane
Date:
Subject: Re: Cost of XLogInsert CRC calculations