Re: Query planner is using wrong index. - Mailing list pgsql-performance
From | Brian Herlihy |
---|---|
Subject | Re: Query planner is using wrong index. |
Date | |
Msg-id | 20060406140151.83127.qmail@web52310.mail.yahoo.com Whole thread Raw |
In response to | Re: Query planner is using wrong index. (Ragnar <gnari@hive.is>) |
Responses |
Re: Query planner is using wrong index.
|
List | pgsql-performance |
--- Ragnar <gnari@hive.is> wrote: > On fim, 2006-04-06 at 19:27 +1000, Brian Herlihy wrote: > > > Yes, the primary key is far better. I gave it the ultimate test - I > dropped > > the (p2, p3) index. It's blindingly fast when using the PK, > > I have problems understanding exactly how an index on > (p1,p2,p3) can be faster than and index on (p2,p3) for > a query not involving p1. > can you demonstrate this with actual EXPLAIN ANALYZES ? > something like: > EXPLAIN ANALYZE select * from t where p2 = ? and p3 = ?; > BEGIN; > DROP INDEX p2p3; > EXPLAIN ANALYZE select * from t where p2 = ? and p3 = ?; > ROLLBACK; > > maybe your p2p3 index needs REINDEX ? > Here's the output. The timings after caching are repeatable (varying only by 10% or so). Query before caching: db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com' AND p3 = 'web/results?itag=&q=&kgs=&kls='; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Index Scan using p2_p3_idx on t (cost=0.00..6.02 rows=1 width=102) (actual time=2793.247..2793.247 rows=0 loops=1) Index Cond: (((p2)::text = 'uk.altavista.com'::text) AND ((p3)::text = 'web/results?itag=&q=&kgs=&kls='::text)) Filter: ((p1)::text = 'a'::text) Total runtime: 2793.303 ms (4 rows) Query after caching: db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com' AND p3 = 'web/results?itag=&q=&kgs=&kls='; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Index Scan using p2_p3_idx on t (cost=0.00..6.02 rows=1 width=102) (actual time=0.617..0.617 rows=0 loops=1) Index Cond: (((p2)::text = 'uk.altavista.com'::text) AND ((p3)::text = 'web/results?itag=&q=&kgs=&kls='::text)) Filter: ((p1)::text = 'a'::text) Total runtime: 0.665 ms (4 rows) === At this point I did "DROP INDEX p2_p3_idx" Query after dropping index: db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com' AND p3 = 'web/results?itag=&q=&kgs=&kls='; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using t_pkey on t (cost=0.00..6.02 rows=1 width=102) (actual time=95.188..95.188 rows=0 loops=1) Index Cond: (((p1)::text = 'a'::text) AND ((p2)::text = 'uk.altavista.com'::text) AND ((p3)::text = 'web/results?itag=&q=&kgs=&kls='::text)) Total runtime: 95.239 ms (3 rows) Query after dropping index, fully cached: db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com' AND p3 = 'web/results?itag=&q=&kgs=&kls='; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using t_pkey on t (cost=0.00..6.02 rows=1 width=102) (actual time=0.030..0.030 rows=0 loops=1) Index Cond: (((p1)::text = 'a'::text) AND ((p2)::text = 'uk.altavista.com'::text) AND ((p3)::text = 'web/results?itag=&q=&kgs=&kls='::text)) Total runtime: 0.077 ms (3 rows) And one where the query planner chooses the primary key instead. Both p2 and p3 are present as Most Common Values in pg_statistics: Query before fully cached: db# explain analyze SELECT * FROM t WHERE p1 = 'b' AND p2 = 'www.google.com' AND p3 = 'search?hl=&lr=&q='; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Index Scan using t_pkey on t (cost=0.00..6.02 rows=1 width=102) (actual time=212.092..212.100 rows=1 loops=1) Index Cond: (((p1)::text = 'b'::text) AND ((p2)::text = 'www.google.com'::text) AND ((p3)::text = 'search?hl=&lr=&q='::text)) Total runtime: 212.159 ms (3 rows) Query after fully cached: db# explain analyze SELECT * FROM t WHERE p1 = 'b' AND p2 = 'www.google.com' AND p3 = 'search?hl=&lr=&q='; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Index Scan using t_pkey on t (cost=0.00..6.02 rows=1 width=102) (actual time=0.034..0.039 rows=1 loops=1) Index Cond: (((p1)::text = 'b'::text) AND ((p2)::text = 'www.google.com'::text) AND ((p3)::text = 'search?hl=&lr=&q='::text)) Total runtime: 0.094 ms (3 rows) I have set statistics to 1000 on all of p1, p2 and p3. The table was recently vacuumed and analyzed, and the index was recreated (after being dropped) before these tests were run. The tests are 100% reproducible, both in postgresql 7.4.7 and 8.1.3. The indexes are: t_pkey (p1, p2, p3) -- UNIQUE, PRIMARY KEY p2_p3_idx (p2, p3) -- NOT UNIQUE The problem is that a lookup which specifies p2 and p3 can return as many as 500 rows. The optimizer assumes that such a lookup will return 1 row, and so it chooses a bad plan. That sums it up. What I need is a way to make it choose the primary key. Thanks in advance, Brian
pgsql-performance by date: