Home > mailing lists

Re: NOT LIKE much faster than LIKE? - Mailing list pgsql-performance

From	Andrea Arcangeli
Subject	Re: NOT LIKE much faster than LIKE?
Date	January 9, 2006 22:23:07
Msg-id	20060110022303.GA20168@opteron.random Whole thread
In response to	Re: NOT LIKE much faster than LIKE? (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: NOT LIKE much faster than LIKE? Re: NOT LIKE much faster than LIKE? Re: NOT LIKE much faster than LIKE?
List	pgsql-performance

Tree view

On Mon, Jan 09, 2006 at 09:04:48PM -0500, Tom Lane wrote:
> Andrea Arcangeli <andrea@cpushare.com> writes:
> > It just makes no sense to me that the planner takes a difference
> > decision based on a "not".
>
> Why in the world would you think that?  In general a NOT will change the
> selectivity of the WHERE condition tremendously.  If the planner weren't
> sensitive to that, *that* would be a bug.  The only case where it's
> irrelevant is if the selectivity of the base condition is exactly 50%,
> which is not a very reasonable default guess for LIKE.

How do you know that "LIKE" will have a selectivity above 50% in the
first place? I think 50% should be the default unless the selectively is
measured at runtime against the data being queried.

If you don't know the data, I think it's a bug that LIKE is assumed to
have a selectivity above 50%. You can't know that, only the author of
the code can know that and that's why I talked about hints. It'd be fine
to give hints like:

    UNLIKELY string LIKE '%% PREEMPT %%'

or:

    LIKELY string NOT LIKE '%% PREEMPT %%'

Then you could assume that very little data will be returned or a lot of
data will be returned.

If you don't get hints NOT LIKE or LIKE should be assumed to have the
same selectivity.

> It sounds to me that the problem is misestimation of the selectivity
> of the LIKE pattern --- the planner is going to think that
> LIKE '%% PREEMPT %%' is fairly selective because of the rather long
> match text, when in reality it's probably not so selective on your
> data.  But we don't keep any statistics that would allow the actual

True, there's a lot of data that matches %% PREEMPT %% (even if less
than the NOT case).

> number of matching rows to be estimated well.  You might want to think
> about changing your data representation so that the pattern-match can be
> replaced by a boolean column, or some such, so that the existing
> statistics code can make a more reasonable estimate how many rows are
> selected.

I see. I can certainly fix it by stopping using LIKE. But IMHO this
remains a bug, since until the statistics about the numberof matching
rows isn't estimated well, you should not make assumptions on LIKE/NOT
LIKE. I think you can change the code in a way that I won't have to
touch anything, and this will lead to fewer surprises in the future IMHO.

Thanks!

pgsql-performance by date:

From: Tom Lane
Date: 09 January 2006, 22:04:49
Subject: Re: NOT LIKE much faster than LIKE?

From: Christopher Kings-Lynne
Date: 09 January 2006, 22:27:50
Subject: Re: NOT LIKE much faster than LIKE?

Re: NOT LIKE much faster than LIKE? - Mailing list pgsql-performance

Previous

Next