Minimum selectivity estimate for LIKE 'prefix%' - Mailing list pgsql-patches

From Tom Lane
Subject Minimum selectivity estimate for LIKE 'prefix%'
Date
Msg-id 23548.1204838214@sss.pgh.pa.us
Whole thread Raw
Responses Re: Minimum selectivity estimate for LIKE 'prefix%'  (Peter Eisentraut <peter_e@gmx.net>)
Re: Minimum selectivity estimate for LIKE 'prefix%'  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-patches
Peter complained awhile back about unrealistically small selectivity
estimates for LIKE with a fixed-prefix pattern:
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00939.php
I had suspected that this was related to the locale-specific problems
we fixed a few months ago, but having finally had a chance to look
at the data off-list, there doesn't seem to be any such component
to the problem.  What it boils down to is that when we generate a
range constraint based on the prefix, such as
    col >= 'prefix' AND col < 'prefiy'
if the prefix is more than a few characters long then the two endpoint
values are indistinguishable as far as comparison to the histogram
is concerned, and so we come out with a selectivity estimate that
is zero to within roundoff error.  This is unreasonably optimistic
and can lead to bad plan choices.

What I propose doing about this is a small variant on Peter's original
suggestion: compute the estimated selectivity for
    col = 'prefix'
and clamp the result of prefix_selectivity to be at least that.
This is plausible on intuitive grounds since the range constraint
must surely include at least these values.  Furthermore, it eliminates
what had been an entirely ad-hoc choice of a lower bound (the code
was clamping to at least 1e-10, which is surely unreasonably
optimistic).

The end result of this, for the case Peter is interested in where there
are no especially common values, is that the minimum selectivity
estimate for LIKE 'prefix%' will be essentially 1/ndistinct where
ndistinct is the estimated number of distinct values in the column,
because that's what eqsel() does in such cases.

I attach a proposed patch against HEAD for this.  It's a bit long
but most of the bulk is refactoring eqsel() to make it easy to use from
prefix_selectivity().  The intellectual content is just in the last
diff hunk.

To help out Peter's client, who's running 8.1.x, we'd have to backpatch
at least that far.  We backpatched the last round of LIKE selectivity
fixes to 8.1, so I don't have too much hesitation about doing the same
here.

Comments?

            regards, tom lane


Attachment

pgsql-patches by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: CopyReadLineText optimization
Next
From: Euler Taveira de Oliveira
Date:
Subject: Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit