Home > mailing lists

Re: 7.0 like selectivity - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: 7.0 like selectivity
Date	April 6, 2000 11:31:06
Msg-id	27244.955034968@sss.pgh.pa.us Whole thread Raw
In response to	7.0 like selectivity ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
Responses	RE: 7.0 like selectivity
List	pgsql-hackers

Tree view

"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> For the query
> select hoge_cd,shimeinn,tel
>  from t_hoge
>  where shimeinn like 'imag%'
>    and tel like '012%'
>  order by hoge_cd
>  limit 100;

> 64 rows returned immediately.

> And for the query
> select hoge_cd,shimeinn,tel
>  from t_hoge
>  where shimeinn like 'imag%'
>    and tel like '012-3%'
>  order by hoge_cd
>   limit 100;

> 24 rows returned after waiting 8 minutes.

So what were the plans for these two queries?  Also, has this table been
"vacuum analyzed"?

> I got the following output from him.
> explain select * from t_hoge where tel like '012%';
>     Index Scan using t_hoge_ix3 on t_hoge  (cost=0.00..0.23 rows=1981
>     width=676)

> explain select * from t_hoge where tel like '012-3%';
>     Index Scan using t_hoge_ix3 on t_hoge  (cost=0.00..0.00 rows=1981
>     width=676)

> In fact,count(*) is 342323 and 114741 respectively.

> The first problem is that estimated cost is too low.
> It seems that the index selectivity of '012-3%' = the index
> selectivity of '012%' / (256*256),right ? 
> If so,does it give more practical estimation than before ?
> It doesn't correspond to rows information either.

The rows number is fairly bogus (because it's coming from application of
eqsel, which is not the right thing; perhaps someday LIKE should have
its very own selectivity estimation function).  But the cost estimate
is driven by the estimated selectivity oftel >= '012-3' AND tel < '012-4'
and it would be nice to think that we have some handle on that.

It could be that the thing is getting fooled by a very non-uniform
distribution of telephone numbers.  You indicate that most of the
numbers in the DB begin with '012', but if there are a small number
that begin with digits as high as 9, the selectivity estimates would
be pretty bad.

> In reality, * shimeinn like 'imag%' * is much more restrictive
> than * tel like '012-3%' *.  However I couldn't think of the
> way to foresee which is more restrictive. Now I doubt whether
> we have enough information to estimate LIKE selectivity
> correctly.

No, we don't, because we only keep track of the min and max values
in each column and assume that the data is uniformly distributed
between those limits.  Perhaps someday we could keep a histogram
instead --- but VACUUM ANALYZE would get a lot slower and more
complicated ...
        regards, tom lane

pgsql-hackers by date:

From: Bruce Momjian
Date: 06 April 2000, 11:13:41
Subject: Book and TEMP vs. TEMPORARY

From: Tom Lane
Date: 06 April 2000, 11:37:00
Subject: Re: Book and TEMP vs. TEMPORARY

Re: 7.0 like selectivity - Mailing list pgsql-hackers

Previous

Next