Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order) - Mailing list pgsql-hackers
From | Jeffrey W. Baker |
---|---|
Subject | Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order) |
Date | |
Msg-id | 1116393137.17217.24.camel@noodles Whole thread Raw |
In response to | Re: bitmap scans, btree scans, and tid order (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)
Re: Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order) |
List | pgsql-hackers |
On Mon, 2005-05-16 at 14:35 -0400, Tom Lane wrote: > "Jeffrey W. Baker" <jwbaker@acm.org> writes: > > On Mon, 2005-05-16 at 09:53 -0400, Tom Lane wrote: > >> This is a fallacy, and I think your concern is largely mistaken. Have > >> you experimented with the cases you are worried about? > > > Perhaps I have not stated the problem clearly. Believe me, I have > > experimented extensively with this problem. > > Sorry, perhaps I wasn't clear: have you experimented *with CVS tip*? > The current code is certainly capable of choosing either seqscan, > bitmap scan, or traditional index scan depending on the given query. > For example, [...] > I think the work that's most needed at the moment is to test the > bitmap-scan cost model to see if it has much to do with reality... The cost model seems to work pretty darn well, as a matter of fact. This table is in-order by id, in-order by date, and randomly ordered by "random". scratch=# \d test Table "public.test"Column | Type | Modifiers --------+---------+-----------id | integer | date | date | random | integer | Indexes: "test_pk" UNIQUE, btree (id) "test_date_idx" btree (date) "test_rand_idx" btree (random) scratch=# select count(1) from test; count ----------10000000 The size of the tables and indexes is about 1G, or double physical memory. I tested by selecting on the random column. For 100 random values, I selected the row that matches exactly, then in ranges of 1000, 10000, 100000, and 1000000. These touch roughly 5, 50, 500, and 5000 tuples, respectively. The test is repeated for index scan, bitmap scan, and sequential scan. Example query: select count(1) from test where random >= 1429076987 and random < 1429076987 + 10000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=171.31..171.31 rows=1 width=0) (actual time=0.996..1.000 rows=1 loops=1) -> Bitmap Heap Scan on test (cost=2.26..171.20rows=43 width=0) (actual time=0.145..0.829 rows=42 loops=1) Recheck Cond: ((random >= 1429076987)AND (random < 1429086987)) -> Bitmap Index Scan on test_rand_idx (cost=0.00..2.26 rows=43 width=0) (actualtime=0.063..0.063 rows=42 loops=1) Index Cond: ((random >= 1429076987) AND (random < 1429086987))Totalruntime: 1.114 ms 100 queries returning | 1 | 5 | 50 | 500 | 5000 | ----------------------+-----+-----+------+-------+--------+ bitmap scan | 1s | 2s | 13s | 1m41s | 11m16s | index scan | 1s | 1s | 12s | 2m6s | 24m19s | ----------------------+-----+-----+------+-------+--------+ sequential scan | 28m20s | The planner uses index scan for the first two columns, and chooses bitmap scan for the rest. This would seem to be a reasonable decision, given the results. The only real problem with the cost model in further testing is that it doesn't switch to seq scan quickly enough. If bitmap scan is disabled, the planner will pick index scan even in cases when sequential scan is 10x faster: scratch=# set enable_bitmapscan to off; SET scratch=# explain analyze select count(1) from test where random >= 1429076987 and random < 1429076987 + 10000000; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=170142.03..170142.03 rows=1 width=0) (actual time=177419.182..177419.185 rows=1 loops=1) -> Index Scan using test_rand_idxon test (cost=0.00..170034.11 rows=43167 width=0) (actual time=0.035..177255.696 rows=46764 loops=1) Index Cond: ((random >= 1429076987) AND (random < 1439076987))Total runtime: 177419.302 ms (4 rows) scratch=# set enable_indexscan to off; SET scratch=# explain analyze select count(1) from test where random >= 1429076987 and random < 1429076987 + 10000000; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------Aggregate (cost=204165.55..204165.55 rows=1 width=0) (actual time=12334.042..12334.045 rows=1 loops=1) -> Seq Scan on test (cost=0.00..204057.62rows=43167 width=0) (actual time=17.436..12174.150 rows=46764 loops=1) Filter: ((random >= 1429076987)AND (random < 1439076987))Total runtime: 12334.156 ms (4 rows) Obviously in this case sequential scan was (would have been) a huge win. Incrementing random_page_cost from 4 (the default) to 5 causes the planner to make a better decision. -jwb
pgsql-hackers by date: