Hi all,
There was a bug(??) report about LIKE optimization of
7.0 beta3 in Japan from Akira Imagawa.
It may be difficult to solve.
Let t_hoge be a table like
{hoge_cd int4 primary key,shimeinn text,tel text,..
}
index hoge_ix2 on t_hoge(shimeinn).
index hoge_ix3 on t_hoge(tel).
There are 348236 rows in t_hoge.
For the query
select hoge_cd,shimeinn,telfrom t_hogewhere shimeinn like 'imag%' and tel like '012%'order by hoge_cdlimit 100;
64 rows returned immediately.
And for the query
select hoge_cd,shimeinn,telfrom t_hogewhere shimeinn like 'imag%' and tel like '012-3%'order by hoge_cd limit 100;
24 rows returned after waiting 8 minutes.
I got the following output from him.
explain select * from t_hoge where tel like '012%';Index Scan using t_hoge_ix3 on t_hoge (cost=0.00..0.23
rows=1981width=676)
explain select * from t_hoge where tel like '012-3%';Index Scan using t_hoge_ix3 on t_hoge (cost=0.00..0.00
rows=1981width=676)
In fact,count(*) is 342323 and 114741 respectively.
The first problem is that estimated cost is too low.
It seems that the index selectivity of '012-3%' = the index
selectivity of '012%' / (256*256),right ?
If so,does it give more practical estimation than before ?
It doesn't correspond to rows information either.
In reality, * shimeinn like 'imag%' * is much more restrictive
than * tel like '012-3%' *. However I couldn't think of the
way to foresee which is more restrictive. Now I doubt whether
we have enough information to estimate LIKE selectivity
correctly. It's the second problem.
Comments ?
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp