Thread: Indexing of LIKE queries is broken in current sources
I noticed today that current sources are not able to use an index for a query likeselect * from foo where bar like 'xyz%'; It seems that the parser now emits some kind of function call for LIKE expressions, whereas the optimizer's code to use indexes for LIKE is looking for an operator. I have more pressing things to do than try to teach the optimizer about looking for function calls as well as operators, so I recommend reverting the parser's output to what it was. regards, tom lane
> It seems that the parser now emits some kind of function call for LIKE > expressions, whereas the optimizer's code to use indexes for LIKE is > looking for an operator. > I have more pressing things to do than try to teach the optimizer about > looking for function calls as well as operators, so I recommend > reverting the parser's output to what it was. Oh, that's bad. I changed it to the function call to allow three-parameter LIKE clauses, which is a perfectly reasonable thing to do imho. This LIKE shortcircuit stuff is a hack anyway, but where should I look in the optimizer? - Thomas
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> It seems that the parser now emits some kind of function call for LIKE >> expressions, whereas the optimizer's code to use indexes for LIKE is >> looking for an operator. > Oh, that's bad. I changed it to the function call to allow > three-parameter LIKE clauses, which is a perfectly reasonable thing to > do imho. Well, it's a clean solution in and of itself, but it fails to take into account what the rest of the system is able or not able to do. > This LIKE shortcircuit stuff is a hack anyway, but where should I look > in the optimizer? The tip of this iceberg is the "special index operator" hacks in backend/optimizer/path/indxpath.c. However, the true dimensions of the problem don't become apparent until you realize that *none* of the optimizer does anything much with function calls as opposed to operators. To get back up to the level of functionality we had for LIKE before, you'd also need to devise a way of doing selectivity estimation for function calls --- ie, define an API for function estimators, add appropriate column(s) to pg_proc, then start actually writing some code. This would be a good thing to do someday, but I think we have considerably more pressing problems to deal with for 7.1. BTW, none of the code that actually bashes LIKE/regexp patterns in indxpath.c/selfuncs.c knows anything about nonstandard ESCAPE characters for patterns. It was this (and the code's lack of knowledge about ILIKE) that I'd originally had as my to-do item --- it wasn't till I realized you'd switched from operators to functions that I saw this was more than a trivial problem to fix. After a little bit of thought I think I see a way out that avoids opening the function-selectivity can of worms. Let's translate LIKE to an operator same as we always did (and add an operator for ILIKE). The forms with an ESCAPE clause can be translated to the same operators but with a righthand argument that is a call of a new function escape_for_like(pattern, escape) escape_for_like would interpret the pattern with the given escape character and translate it into a pattern that uses the standard escape character, ie backslash. After constant folding, this looks just like the old style of LIKE call and none of the optimizer code has to change at all, except to add a case for ILIKE which will be a trivial addition. LIKE itself gets simpler too, since the pattern matcher needn't cope with nonstandard escape characters. Seems like a good subdivision of labor within LIKE. Thoughts? I'm willing to take responsibility for making this happen if you agree it's a good solution. regards, tom lane
> After a little bit of thought I think I see a way out that avoids > opening the function-selectivity can of worms. Let's translate LIKE > to an operator same as we always did (and add an operator for ILIKE). > The forms with an ESCAPE clause can be translated to the same operators > but with a righthand argument that is a call of a new function > escape_for_like(pattern, escape) > escape_for_like would interpret the pattern with the given escape > character and translate it into a pattern that uses the standard escape > character, ie backslash. After constant folding, this looks just like > the old style of LIKE call and none of the optimizer code has to change > at all, except to add a case for ILIKE which will be a trivial addition. > LIKE itself gets simpler too, since the pattern matcher needn't cope > with nonstandard escape characters. Seems like a good subdivision of > labor within LIKE. > Thoughts? I'm willing to take responsibility for making this happen > if you agree it's a good solution. Hmm. It's a great solution, though I'm disappointed that the seemingly more direct function-call strategy turned out to be a dead end (at least for now). It seems that "~~*" should be the operator for ILIKE. - Thomas btw, I notice that psql has trouble with \dd when you try to show any operator which contains an asterisk. \dd * results in an error. \dd ~* shows every function, type, and operator. I've tried surrounding it with single- and double-quotes, but that didn't seem to work around it.