Thread: Indexing of LIKE queries is broken in current sources

Indexing of LIKE queries is broken in current sources

From
Tom Lane
Date:
I noticed today that current sources are not able to use an index
for a query likeselect * from foo where bar like 'xyz%';

It seems that the parser now emits some kind of function call for LIKE
expressions, whereas the optimizer's code to use indexes for LIKE is
looking for an operator.

I have more pressing things to do than try to teach the optimizer about
looking for function calls as well as operators, so I recommend
reverting the parser's output to what it was.
        regards, tom lane


Re: Indexing of LIKE queries is broken in current sources

From
Thomas Lockhart
Date:
> It seems that the parser now emits some kind of function call for LIKE
> expressions, whereas the optimizer's code to use indexes for LIKE is
> looking for an operator.
> I have more pressing things to do than try to teach the optimizer about
> looking for function calls as well as operators, so I recommend
> reverting the parser's output to what it was.

Oh, that's bad. I changed it to the function call to allow
three-parameter LIKE clauses, which is a perfectly reasonable thing to
do imho.

This LIKE shortcircuit stuff is a hack anyway, but where should I look
in the optimizer?
                      - Thomas


Re: Indexing of LIKE queries is broken in current sources

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> It seems that the parser now emits some kind of function call for LIKE
>> expressions, whereas the optimizer's code to use indexes for LIKE is
>> looking for an operator.

> Oh, that's bad. I changed it to the function call to allow
> three-parameter LIKE clauses, which is a perfectly reasonable thing to
> do imho.

Well, it's a clean solution in and of itself, but it fails to take into
account what the rest of the system is able or not able to do.

> This LIKE shortcircuit stuff is a hack anyway, but where should I look
> in the optimizer?

The tip of this iceberg is the "special index operator" hacks in
backend/optimizer/path/indxpath.c.  However, the true dimensions of
the problem don't become apparent until you realize that *none* of
the optimizer does anything much with function calls as opposed to
operators.  To get back up to the level of functionality we had for
LIKE before, you'd also need to devise a way of doing selectivity
estimation for function calls --- ie, define an API for function
estimators, add appropriate column(s) to pg_proc, then start actually
writing some code.  This would be a good thing to do someday, but
I think we have considerably more pressing problems to deal with
for 7.1.

BTW, none of the code that actually bashes LIKE/regexp patterns in
indxpath.c/selfuncs.c knows anything about nonstandard ESCAPE characters
for patterns.  It was this (and the code's lack of knowledge about
ILIKE) that I'd originally had as my to-do item --- it wasn't till I
realized you'd switched from operators to functions that I saw this was
more than a trivial problem to fix.


After a little bit of thought I think I see a way out that avoids
opening the function-selectivity can of worms.  Let's translate LIKE
to an operator same as we always did (and add an operator for ILIKE).
The forms with an ESCAPE clause can be translated to the same operators
but with a righthand argument that is a call of a new function    escape_for_like(pattern, escape)
escape_for_like would interpret the pattern with the given escape
character and translate it into a pattern that uses the standard escape
character, ie backslash.  After constant folding, this looks just like
the old style of LIKE call and none of the optimizer code has to change
at all, except to add a case for ILIKE which will be a trivial addition.
LIKE itself gets simpler too, since the pattern matcher needn't cope
with nonstandard escape characters.  Seems like a good subdivision of
labor within LIKE.

Thoughts?  I'm willing to take responsibility for making this happen
if you agree it's a good solution.
        regards, tom lane


Re: Indexing of LIKE queries is broken in current sources

From
Thomas Lockhart
Date:
> After a little bit of thought I think I see a way out that avoids
> opening the function-selectivity can of worms.  Let's translate LIKE
> to an operator same as we always did (and add an operator for ILIKE).
> The forms with an ESCAPE clause can be translated to the same operators
> but with a righthand argument that is a call of a new function
>                 escape_for_like(pattern, escape)
> escape_for_like would interpret the pattern with the given escape
> character and translate it into a pattern that uses the standard escape
> character, ie backslash.  After constant folding, this looks just like
> the old style of LIKE call and none of the optimizer code has to change
> at all, except to add a case for ILIKE which will be a trivial addition.
> LIKE itself gets simpler too, since the pattern matcher needn't cope
> with nonstandard escape characters.  Seems like a good subdivision of
> labor within LIKE.
> Thoughts?  I'm willing to take responsibility for making this happen
> if you agree it's a good solution.

Hmm. It's a great solution, though I'm disappointed that the seemingly
more direct function-call strategy turned out to be a dead end (at least
for now).

It seems that "~~*" should be the operator for ILIKE.
                   - Thomas

btw, I notice that psql has trouble with \dd when you try to show any
operator which contains an asterisk.

\dd *

results in an error.

\dd ~*

shows every function, type, and operator.

I've tried surrounding it with single- and double-quotes, but that
didn't seem to work around it.