Re: index / sequential scan problem - Mailing list pgsql-performance
From | Fabian Kreitner |
---|---|
Subject | Re: index / sequential scan problem |
Date | |
Msg-id | 5.1.0.14.0.20030718063650.0397b238@195.145.148.245 Whole thread Raw |
In response to | Re: index / sequential scan problem (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: index / sequential scan problem
|
List | pgsql-performance |
At 20:12 17.07.2003, Tom Lane wrote: >Fabian Kreitner <fabian.kreitner@ainea-ag.de> writes: > > That is what I read too and is why Im confused that the index is indeed > > executing faster. Can this be a problem with the hardware and/or postgress > > installation? > >I think the actual issue here is that you are executing the EXISTS >subplan over and over, once for each outer row. The planner's cost >estimate for EXISTS is based on the assumption that you do it once >... in which scenario the seqscan very possibly is cheaper. However, >when you do the EXISTS subplan over and over for many outer rows, you >get a savings from the fact that the index and table pages soon get >cached in memory. The seqscan plan gets a savings too, since the table >is small enough to fit in memory, but once everything is in memory the >indexscan plan is faster. > >There's been some discussion on pghackers about how to teach the planner >to account for repeated executions of subplans, but we have not come up >with a good solution yet. > >For the moment, what people tend to do if they know their database is >small enough to mostly stay in memory is to reduce random_page_cost to >make the planner favor indexscans. If you know the database is entirely >cached then the theoretically correct value of random_page_cost is 1.0 >(since fetching any page will cost the same, if it's all in RAM). I'd >recommend against adopting that as a default, but a lot of people find >that setting it to 2.0 or so seems to model their situation better than >the out-of-the-box 4.0. Thanks for the explanation :) However .... :( perg_1097=# vacuum analyze; VACUUM perg_1097=# set random_page_cost to 1.0; SET VARIABLE perg_1097=# EXPLAIN ANALYZE select notiz_id, obj_id, obj_typ perg_1097-# from notiz_objekt a perg_1097-# where not exists perg_1097-# ( perg_1097(# select 1 perg_1097(# from notiz_gelesen b perg_1097(# where ma_id = 2001 perg_1097(# and ma_pid = 1097 perg_1097(# and a.notiz_id = b.notiz_id perg_1097(# ) perg_1097-# ; NOTICE: QUERY PLAN: Seq Scan on notiz_objekt a (cost=0.00..56125.80 rows=15561 width=12) (actual time=0.27..2299.09 rows=31122 loops=1) SubPlan -> Seq Scan on notiz_gelesen b (cost=0.00..1.79 rows=1 width=0) (actual time=0.07..0.07 rows=0 loops=31122) Total runtime: 2328.05 msec EXPLAIN perg_1097=# ... perg_1097=# set enable_seqscan to false; SET VARIABLE perg_1097=# set random_page_cost to 1.0; SET VARIABLE perg_1097=# EXPLAIN ANALYZE select notiz_id, obj_id, obj_typ perg_1097-# from notiz_objekt a perg_1097-# where not exists perg_1097-# ( perg_1097(# select 1 perg_1097(# from notiz_gelesen b perg_1097(# where ma_id = 2001 perg_1097(# and ma_pid = 1097 perg_1097(# and a.notiz_id = b.notiz_id perg_1097(# ) perg_1097-# ; NOTICE: QUERY PLAN: Seq Scan on notiz_objekt a (cost=100000000.00..100093380.36 rows=15561 width=12) (actual time=0.07..550.07 rows=31122 loops=1) SubPlan -> Index Scan using idx_notiz_gelesen_2 on notiz_gelesen b (cost=0.00..2.98 rows=1 width=0) (actual time=0.01..0.01 rows=0 loops=31122) Total runtime: 582.90 msec EXPLAIN perg_1097=# Even with a random page cost of 1 it thinks using the index should/could take significantly longer which it doesnt for some reason :-/
pgsql-performance by date: