Re: Planner estimates cost of 'like' a lot lower than '='?? - Mailing list pgsql-general

From Tom Lane
Subject Re: Planner estimates cost of 'like' a lot lower than '='??
Date
Msg-id 8764.995840089@sss.pgh.pa.us
Whole thread Raw
In response to Planner estimates cost of 'like' a lot lower than '='??  (Mats Lofkvist <mal@algonet.se>)
Responses Re: Planner estimates cost of 'like' a lot lower than '='??  (Mats Lofkvist <mal@algonet.se>)
List pgsql-general
Mats Lofkvist <mal@algonet.se> writes:
> EXPLAIN
> testdb=> select count(*) from data where value >= 'test' and value < 'tesu';
>  count
> -------
>  10000
> (1 row)

> testdb=> explain select count(*) from data where value >= 'test' and value < 'tesu';
> NOTICE:  QUERY PLAN:

> Aggregate  (cost=4.46..4.46 rows=1 width=0)
>   ->  Index Scan using datavalueindex on data  (cost=0.00..4.45 rows=1 width=0)

Sure enough, this is the source of the speed problem: when using the
index on "value", each iteration of the find1 scan will be indexscanning
10000 tuples to find the single one that passes the LIKE and other
qualifications.  But the planner mistakenly thinks that the indexscan
will find only one tuple, and so it has no reason to prefer the other
index over this one.

(Unfortunately, the planner is too stupid to realize that the other
index *guarantees* to return no more than one tuple for this query,
and hence should be preferred over a mere statistical estimate of one
selected tuple.  Not sure how we could incorporate such a consideration
into what's fundamentally a cost-estimate-driven process.)

I think that current sources will probably do a lot better on the range
estimation problem.  I'll be interested to see what you get from these
same tests when you have the data loaded into current...

            regards, tom lane

pgsql-general by date:

Previous
From: newsreader@mediaone.net
Date:
Subject: psql on red hat 7.1
Next
From: teg@redhat.com (Trond Eivind Glomsrød)
Date:
Subject: Re: psql on red hat 7.1