Thread: ILIKE vs indices
While tuning an application, I found the posts from 2003 recomending the use of LOWER() and LIKE in place of ILIKE to take advantage of indices. For this app, given the limitations of the upper-layer protocol it must support, that change replaced about 30 minutes of repeated seq scans with about 1 minute of repeated index scans! On a query-set often repeated several times per day. (Probably more times per day now.) Is there any contraindication to recasting: foo ILIKE 'bar' into: LOWER(foo) LIKE LOWER('bar') and documenting that an index has to be on LOWER(column) to benefit ILIKE? Perhaps the parser could read the former as the latter? -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6
James Cloos <cloos@jhcloos.com> writes: > Is there any contraindication to recasting: > foo ILIKE 'bar' > into: > LOWER(foo) LIKE LOWER('bar') In some locales those are not equivalent, I believe, or at least shouldn't be. (What the current code actually does is a separate question.) > Perhaps the parser could read the former as the latter? Not unless the equivalence can be shown to be exact, which I doubt. In any case it's not obvious why LOWER rather than UPPER. regards, tom lane
>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes: JC>> Is there any contraindication to recasting: JC>> foo ILIKE 'bar' JC>> into: JC>> LOWER(foo) LIKE LOWER('bar') TL> In some locales those are not equivalent, I believe, or at least TL> shouldn't be. (What the current code actually does is a separate TL> question.) I see. After determining indexing based on th existance of an initial fixed string, exluding anything matching isalpha(), it uses tolower(3) and friends to do the actual match. So my proposal wouldn't change what matches, but might make fixing any bugs in what *should* match more difficult? TL> In any case it's not obvious why LOWER rather than UPPER. I suggested lower() because that matched all of the suggestions I found. And as it turns out matches the current behaviour, too. The footnote about adding explicit mention to the docs was expressly because it is not otherwise obvious whether indices should use lower() or upper(). I'll ask on one of the unicode lists whether there are any locales where a case-insensitive match should be different than a case-preserving match of tolower() vs tolower(). -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6
On Fri, Dec 28, 2012 at 11:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > James Cloos <cloos@jhcloos.com> writes: >> Is there any contraindication to recasting: >> foo ILIKE 'bar' >> into: >> LOWER(foo) LIKE LOWER('bar') > > In some locales those are not equivalent, I believe, or at least > shouldn't be. (What the current code actually does is a separate > question.) What it actually does is actually *precisely* the above. I can't quite wrap my head around the idea of "LIKE" and collations having any meaningful interaction anyways. I certainly can't come up with anything better than "lower() like lower()" (or "upper() like upper()"). It would be nice to document what ILIKE actually means. Right now it's kind of mysterious. And if we can't come up with anything better than "lower() like lower()" then why not go ahead and document it and take advantage of it. -- greg
On Sat, Dec 29, 2012 at 2:05 PM, Greg Stark <stark@mit.edu> wrote: > I can't quite wrap my head around the idea of "LIKE" and collations > having any meaningful interaction anyways. I certainly can't come up > with anything better than "lower() like lower()" (or "upper() like > upper()"). Hm. Maybe I spoke too fast. Perhaps we should just call strcasecmp() character by character, or even call strcasecmp() on any substring of the pattern that doesn't contain _ or % ? The latter would be pretty hopeless to ever use a btree index though. -- greg