Re: 7.0.x not using indices - Mailing list pgsql-bugs
From | Marcin Wolinski |
---|---|
Subject | Re: 7.0.x not using indices |
Date | |
Msg-id | 87hf0x7tie.fsf@Wincenty.nowhere.edu.pl Whole thread Raw |
In response to | 7.0.x not using indices (Marcin Wolinski <wolinski@mimuw.edu.pl>) |
List | pgsql-bugs |
Tom Lane <tgl@sss.pgh.pa.us> writes: > Marcin Wolinski <wolinski@mimuw.edu.pl> writes: > > After this the 'w' class contains 116170 rows. For each value of wfid > > there are no more than 7584 different values of wnr (with median of 1 > > value per wfid and only in about 1000 cases 10 values or more). > > The problem here is the huge differential between the typical and > maximum frequency of wfid values. Can you do something to get rid of > the outlier with 7584 values? I've made some deletions in my test tables. Now w contains 51560 records, no more than 10 records per wfid (median 1). Table i contains 33317, no more than 10 records per ilid (median 1) In these circumstances PostgreSQL 7.0.3 on i686-pc-linux-gnu, compiled by gcc 2.95.2 decided to use an index on TEST1. However for TEST2 being select wnr, wfid from w, i where wfid=ifid and ilid=99050 order by wfid, wnr; it selects precisely the same query plan (the estimated cost has however dropped significantly): Sort (cost=19886.17..19886.17 rows=171782 width=12) -> Merge Join (cost=214.83..3302.03 rows=171782 width=12) -> Index Scan using w_wfid on w (cost=0.00..2438.53 rows=51560 width=8) -> Sort (cost=214.83..214.83 rows=333 width=4) -> Index Scan using i_ilid on i (cost=0.00..200.87 rows=333 width=4) When run with time psql test <test2.sql it shows real 0m2.960s (I know this is terribly rough). I've tried to make Postgres use the same plan that 6.5.3 has selected. First i've used: set enable_mergejoin to off; and PGSQL responded with Sort (cost=62183.01..62183.01 rows=171782 width=12) -> Hash Join (cost=201.70..45598.87 rows=171782 width=12) -> Seq Scan on w (cost=0.00..793.60 rows=51560 width=8) -> Hash (cost=200.87..200.87 rows=333 width=4) -> Index Scan using i_ilid on i (cost=0.00..200.87 rows=333 width=4) and the time was real 0m0.893s Then I've added set enable_hashjoin to off; with the result of (the 6.5.3's plan): Sort (cost=127071.79..127071.79 rows=171782 width=12) -> Nested Loop (cost=0.00..110487.65 rows=171782 width=12) -> Index Scan using i_ilid on i (cost=0.00..200.87 rows=333 width=4) -> Index Scan using w_wfid on w (cost=0.00..324.58 rows=516 width=8) and real 0m0.326s To summarize: 7.0 when forced to use the same plan 6.5.3 has selected has computation time 10 times smaller than when using the plan it thinks is the best. However it estimates the cost to be 10 times bigger than the ``best''... I'm still not convinced that some silly bug is not there. Why were 6.5.3's estimations so much better? It didn't have more information to base its predictions on. And with my original data, why doesn't 7.0 decide to use an index to retrieve no more than 21000 rows out of 100000? Is it a matter of tweaking the COST_STH variables? Regards Marcin ---------------------------------------------------------------------- Marcin Woli\'nski mailto:wolinski@mimuw.edu.pl http://www.mimuw.edu.pl/~wolinski ----------------------------------------------------------------------
pgsql-bugs by date: