On Tue, 2006-01-17 at 13:53 +0100, Magnus Hagander wrote:
> On this table, I do a query like:
> SELECT * FROM path WHERE path LIKE 'f:/userdirs/s/super_73/%'
>
> The estimate for this query is comlpetely off, which I beleive is the
> cause for a very bad selection of a query plan when it's used in a big
> join (creating nestloops that ends up taking 15+ minutes to complete..).
>
>
> Explain analyze gives:
> QUERY PLAN
> ------------------------------------------------------------------------
> -----------------------------------------------
> Index Scan using path_name_idx on path (cost=0.00..3.24 rows=1
> width=74) (actual time=0.035..0.442 rows=214 loops=1)
> Index Cond: ((path >= 'f:/userdirs/s/super'::text) AND (path <
> 'f:/userdirs/s/supes'::text))
> Filter: (path ~~ 'f:/userdirs/s/super_73%'::text)
>
>
> No matter what I search on (when it's very selective), the estimate is
> always 1 row, whereas the actual value is at least a couple of hundred.
> If I try with say "f:/us", the difference is 377,759 estimated vs
> 562,459 returned, which is percentage-wise a lot less, but...
>
> I have tried upping the statistics target up to 1000, with no changes.
> Any way to teach the planner about this?
In a recent thread on -perform, I opined that this case could best be
solved by using dynamic random block sampling at plan time followed by a
direct evaluation of the LIKE against the sample. This would yield a
more precise selectivity and lead to the better plan. So it can be
improved for the next release.
Best Regards, Simon Riggs