Thread: indexing with lower(...) -> queries are not optimised very well - Please Help
indexing with lower(...) -> queries are not optimised very well - Please Help
From
Martin Hampl
Date:
Hi, I am using PostgreSQL 7.4, but I did have the same problem with the last version. I indexed the column word (defined as varchar(64)) using lower(word). If I use the following query, everything is fine, the index is used and the query is executed very quickly: select * from token where lower(word) = 'saxophone'; However, with EXPLAIN you get the following: QUERY PLAN ------------------------------------------------------------------------ ---------------- Index Scan using word_lower_idx on token (cost=0.00..98814.08 rows=25382 width=16) Index Cond: (lower((word)::text) = 'saxophone'::text) I indexed the same column without the use of lower(...). Now explain select * from token where word = 'saxophone'; results in: QUERY PLAN ------------------------------------------------------------------------ ----- Index Scan using word_idx on token (cost=0.00..6579.99 rows=1676 width=16) Index Cond: ((word)::text = 'saxophone'::text) Please note the difference in the estimated cost! Why is there such a huge difference? Both queries almost exactly need the same time to execute (all instances of 'saxophone' in the table are lower-case (this is a coincidence)). The Problem is, if I use this query as part of a more complicated query the optimiser chooses a *very* bad query plan. Please help me. What am I doing wrong? I would appreciate any help an this very much. Regards, Martin.
hi, Martin Hampl wrote, On 11/18/2003 7:24 PM: > Hi, > > I am using PostgreSQL 7.4, but I did have the same problem with the > last version. > > I indexed the column word (defined as varchar(64)) using lower(word). > If I use the following query, everything is fine, the index is used and > the query is executed very quickly: > > select * from token where lower(word) = 'saxophone'; > > However, with EXPLAIN you get the following: > > QUERY PLAN > ------------------------------------------------------------------------ > ---------------- > Index Scan using word_lower_idx on token (cost=0.00..98814.08 > rows=25382 width=16) > Index Cond: (lower((word)::text) = 'saxophone'::text) > > > I indexed the same column without the use of lower(...). Now > > explain select * from token where word = 'saxophone'; > > results in: > QUERY PLAN > ------------------------------------------------------------------------ > ----- > Index Scan using word_idx on token (cost=0.00..6579.99 rows=1676 > width=16) > Index Cond: ((word)::text = 'saxophone'::text) > > Please note the difference in the estimated cost! Why is there such a > huge difference? Both queries almost exactly need the same time to > execute (all instances of 'saxophone' in the table are lower-case (this > is a coincidence)). And after analyze token; ? C.
Hi, > hi, > > Martin Hampl wrote, On 11/18/2003 7:24 PM: >> Hi, >> I am using PostgreSQL 7.4, but I did have the same problem with the >> last version. >> I indexed the column word (defined as varchar(64)) using lower(word). >> If I use the following query, everything is fine, the index is used >> and the query is executed very quickly: >> select * from token where lower(word) = 'saxophone'; >> However, with EXPLAIN you get the following: >> QUERY PLAN >> ---------------------------------------------------------------------- >> -- ---------------- >> Index Scan using word_lower_idx on token (cost=0.00..98814.08 >> rows=25382 width=16) >> Index Cond: (lower((word)::text) = 'saxophone'::text) >> I indexed the same column without the use of lower(...). Now >> explain select * from token where word = 'saxophone'; >> results in: >> QUERY PLAN >> ---------------------------------------------------------------------- >> -- ----- >> Index Scan using word_idx on token (cost=0.00..6579.99 rows=1676 >> width=16) >> Index Cond: ((word)::text = 'saxophone'::text) >> Please note the difference in the estimated cost! Why is there such a >> huge difference? Both queries almost exactly need the same time to >> execute (all instances of 'saxophone' in the table are lower-case >> (this is a coincidence)). > And after analyze token; ? No, doesn't work (I tried that of course). But this might be the problem: how to analyse properly for the use of an index with lower(...). Thanks for the answer, Martin. > > C. > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to > majordomo@postgresql.org >
Re: indexing with lower(...) -> queries are not optimised very well - Please Help
From
Tom Lane
Date:
Martin Hampl <Martin.Hampl@gmx.de> writes: > Index Scan using word_lower_idx on token (cost=0.00..98814.08 > rows=25382 width=16) > Index Cond: (lower((word)::text) = 'saxophone'::text) The rows estimate (and therefore also the cost estimate) is a complete guess in this situation, because the system keeps no statistics about the values of lower(word). Improving this situation is on the TODO list. regards, tom lane
Re: indexing with lower(...) -> queries are not optimised very well - Please Help
From
Martin Hampl
Date:
Am 21.11.2003 um 06:54 schrieb Tom Lane: > Martin Hampl <Martin.Hampl@gmx.de> writes: >> Index Scan using word_lower_idx on token (cost=0.00..98814.08 >> rows=25382 width=16) >> Index Cond: (lower((word)::text) = 'saxophone'::text) > > The rows estimate (and therefore also the cost estimate) is a complete > guess in this situation, because the system keeps no statistics about > the values of lower(word). Improving this situation is on the TODO > list. Thanks a lot for your answer. Any idea about when this situation will be improved? Until then I have to find a work around... any suggestions? Regards, Martin.
Re: indexing with lower(...) -> queries are not optimised very well - Please Help
From
Martin Hampl
Date:
Hi, Am 21.11.2003 um 06:54 schrieb Tom Lane: > Martin Hampl <Martin.Hampl@gmx.de> writes: >> Index Scan using word_lower_idx on token (cost=0.00..98814.08 >> rows=25382 width=16) >> Index Cond: (lower((word)::text) = 'saxophone'::text) > > The rows estimate (and therefore also the cost estimate) is a complete > guess in this situation, because the system keeps no statistics about > the values of lower(word). Improving this situation is on the TODO > list. Any ideas when this will work? Is it difficult to implement? (For those who don't recall the context: I asked about indexing lower values of a varchar-coloumn ("create index xy_idx on table(lower(coloumn));") and how the query planner uses this index). Regards, Martin.
Re: indexing with lower(...) -> queries are not optimised very well - Please Help
From
Tom Lane
Date:
Martin Hampl <Martin.Hampl@gmx.de> writes: > Am 21.11.2003 um 06:54 schrieb Tom Lane: >> [ bad plan for use of a functional index ] >> >> The rows estimate (and therefore also the cost estimate) is a complete >> guess in this situation, because the system keeps no statistics about >> the values of lower(word). Improving this situation is on the TODO >> list. > Any ideas when this will work? Is it difficult to implement? It strikes me as a small-but-not-trivial project. Possibly someone will get it done for 7.5. You can find some discussion in the pghackers archives, IIRC (look for threads about keeping statistics on functional indexes). This brings up a thought for Mark Cave-Ayland's project of breaking out the datatype dependencies in ANALYZE: it would be wise to ensure that the API for examine_attribute doesn't depend too much on the assumption that the value(s) being analyzed are part of the relation proper. They might be coming from a functional index, or even more likely being computed on-the-fly based on the definition of a functional index. Not sure what we'd want to change exactly, but it's something to think about before the API gets set in stone. regards, tom lane