RE: 7.0 like selectivity - Mailing list pgsql-hackers
From | Hiroshi Inoue |
---|---|
Subject | RE: 7.0 like selectivity |
Date | |
Msg-id | 000501bfa021$3d611240$2801007e@tpf.co.jp Whole thread Raw |
In response to | Re: 7.0 like selectivity (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
RE: 7.0 like selectivity
|
List | pgsql-hackers |
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > For the query > > select hoge_cd,shimeinn,tel > > from t_hoge > > where shimeinn like 'imag%' > > and tel like '012%' > > order by hoge_cd > > limit 100; > > > 64 rows returned immediately. > > > And for the query > > select hoge_cd,shimeinn,tel > > from t_hoge > > where shimeinn like 'imag%' > > and tel like '012-3%' > > order by hoge_cd > > limit 100; > > > 24 rows returned after waiting 8 minutes. > > So what were the plans for these two queries? OK,I would ask him to send them. > Also, has this table been > "vacuum analyzed"? > Yes,his another problem was solved by "vacuum analyze". > > I got the following output from him. > > explain select * from t_hoge where tel like '012%'; > > Index Scan using t_hoge_ix3 on t_hoge (cost=0.00..0.23 rows=1981 > > width=676) > > > explain select * from t_hoge where tel like '012-3%'; > > Index Scan using t_hoge_ix3 on t_hoge (cost=0.00..0.00 rows=1981 > > width=676) > > > In fact,count(*) is 342323 and 114741 respectively. > > > The first problem is that estimated cost is too low. > > It seems that the index selectivity of '012-3%' = the index > > selectivity of '012%' / (256*256),right ? > > If so,does it give more practical estimation than before ? > > It doesn't correspond to rows information either. > > The rows number is fairly bogus (because it's coming from application of > eqsel, which is not the right thing; perhaps someday LIKE should have > its very own selectivity estimation function). But the cost estimate > is driven by the estimated selectivity of > tel >= '012-3' AND tel < '012-4' > and it would be nice to think that we have some handle on that. > Shouldn't rows number and cost estimate correspond in this case ? For example,the following query would return same row numbers.select * from t_hoge where tel = '012'; And the cost estimate is probably > 1000. Is it good that the cost estimate for "tel like '012%'" is much smaller than " tel = '012' " ? PostgreSQL's selectivity doesn't mean a pure probabilty. For example,for int4 type the pure '=' probabity is pow(2,-32). Is current cost estimate for " tel>=val1 and tel <val2'" is effective for narrow range of (val1,val2) ? The range ('012-3','012-4') is veeeery narrow in the vast char(5) space. Regards. Hiroshi Inoue Inoue@tpf.co.jp
pgsql-hackers by date: