7.0.x not using indices - Mailing list pgsql-bugs
From | Marcin Wolinski |
---|---|
Subject | 7.0.x not using indices |
Date | |
Msg-id | 87ofvhikem.fsf@Wincenty.nowhere.edu.pl Whole thread Raw |
Responses |
Re: 7.0.x not using indices
|
List | pgsql-bugs |
Dear Sirs, [This is in a way a reiteration of Marcin Zukowski's report of 19 Jan 2001 http://pgsql.service.net.pl/mhonarc/pgsql-bugs/2001-01/msg00110.html which was ignored because of a somewhat unfortunate test case.] I have created a new database and then have run this: ---------------------------------------------------------------------- create table w (wnr int4, wfid int4); create table i (ifid int4, ilid int4); \copy w from w.dump \copy i from i.dump create index w_wfid on w(wfid); create index i_ilid on i(ilid); vacuum verbose; ---------------------------------------------------------------------- [Please excuse column names. These are slices of tables from a database containing morphologically tagged corpus of Polish language. 'w' are wordforms and 'i' are their interpretations.] After this the 'w' class contains 116170 rows. For each value of wfid there are no more than 7584 different values of wnr (with median of 1 value per wfid and only in about 1000 cases 10 values or more). Class 'i' contains 47016 rows. For each ilid there are no more than 481 different ifids (median 2, 800 cases with 10 or more). Now consider the following two queries: TEST1: ---------------------------------------------------------------------- explain select wnr, wfid from w where wfid in (57868, 99050, 121041) order by wfid, wnr; ---------------------------------------------------------------------- On PostgreSQL 6.5.3 on Linux 2.2.17 this shows: ---------------------------------------------------------------------- Sort (cost=6.15 rows=5 width=8) -> Index Scan using w_wfid, w_wfid, w_wfid on w (cost=6.15 rows=5 width=8) ---------------------------------------------------------------------- while on PostgreSQL 7.0.3 on Linux and 7.0.0 on M$ Windows: ---------------------------------------------------------------------- Sort (cost=2860.73..2860.73 rows=3450 width=8) -> Seq Scan on w (cost=0.00..2657.98 rows=3450 width=8) ---------------------------------------------------------------------- We can estimate the number of rows to be at most 3% of the table content (and usually much much less, this particular query returns 5 rows). So the idea of using an index here doesn't seem very strange, yet 7.0.x does not do it. TEST2: ---------------------------------------------------------------------- explain select wnr, wfid from w, i where wfid=ifid and ilid=99050 order by wfid, wnr; ---------------------------------------------------------------------- On PostgreSQL 6.5.3: ---------------------------------------------------------------------- Sort (cost=6.15 rows=5 width=12) -> Nested Loop (cost=6.15 rows=5 width=12) -> Index Scan using i_ilid on i (cost=2.05 rows=2 width=4) -> Index Scan using w_wfid on w (cost=2.05 rows=116170 width=8) ---------------------------------------------------------------------- On PostgreSQL 7.0.3 on Linux and 7.0.0 on M$ Windows: ---------------------------------------------------------------------- Sort (cost=77288.43..77288.43 rows=546185 width=12) -> Merge Join (cost=311.44..7899.66 rows=546185 width=12) -> Index Scan using w_wfid on w (cost=0.00..6130.21 rows=116170 width=8) -> Sort (cost=311.44..311.44 rows=470 width=4) -> Index Scan using i_ilid on i (cost=0.00..290.57 rows=470 width=4) ---------------------------------------------------------------------- In typical case this query returns less than 1% of the rows from the join. So why not to use the index? The 'cost' values in 7.0.x are really pleasing to the eye. And it in fact computes as it says it would. TEST2 on Postgres 6.5 runs in less than a second, on 7.0 on the same machine it takes 25 seconds. This causes PostgreSQL 7.0 to be completely useless to me, since my real query joining 5 tables runs for hours instead to complete in seconds. Is it something I do wrong or is it a bug? Is there a way to circumvent this? The queries above have a common feature with Marcin's example: they require to lookup more than one value in an index. Is this a problem for 7.0? If you want to play with my data I can send it all to you (~2MB) or can try to prepare a smaller test set. BTW. How can I learn to read these query plans? Regards (and thanks for an otherwise splendid product) Marcin ---------------------------------------------------------------------- Marcin Woli\'nski mailto:wolinski@mimuw.edu.pl http://www.mimuw.edu.pl/~wolinski ----------------------------------------------------------------------
pgsql-bugs by date: