Re: 7.3 no longer using indexes for LIKE queries - Mailing list pgsql-general
From | Matthew Gabeler-Lee |
---|---|
Subject | Re: 7.3 no longer using indexes for LIKE queries |
Date | |
Msg-id | ABABFB80F35AD311848B0090279918EF010B9B67@ZYCOSNT2.hq.zycos.com Whole thread Raw |
In response to | 7.3 no longer using indexes for LIKE queries (Matthew Gabeler-Lee <mgabelerlee@zycos.com>) |
Responses |
Re: 7.3 no longer using indexes for LIKE queries
Re: 7.3 no longer using indexes for LIKE queries |
List | pgsql-general |
Can anyone explain the rationale behind that sort order? I'm guessing it has something to do with getting sequences of words sorted 'right', but I fail to see how that is right. I'd think when sorting it would make sense to have all sequences of words that start with the same word(s) together. $ echo -e 'bobbill\nbobrob\nbob bill\nbob robber' | LC_ALL=en_US sort bobbill bob bill -\ bobrob |-- Seems these ought to be adjacent? bob robber -/ Perhaps I'm just old fashioned. In any case, I don't define locale collation orders, so *shrug*. The pseudocode I gave for finding insane ones could also be extended to find information for the modified LIKE optimization you suggest. Find chars (like space) that are ignored in the collation order, as it does, and report them. When picking the lower bound, drop all such chars from the beginning of the expression, then find the upper bound from the thus shortened lower bound. This would work for stuff that sorted like en_US does with spaces, but something that sorted 'a b' *before* 'ab' gets into the realms of the impossible to deal with (there was a message in one of the lists talking about a locale that apparently suffered from this problem, it at least sorted 'abc\NNN' before 'abc' for some value of NNN which I forget). In locales whose second-pass sort only moves things forwards, the least the planner could know these 'danger' characters and only optimize if none of them are present in the query string. foo LIKE 'ab%' can be safely optimized in en_US since it only has chars that sort simply. Finding which chars sort simply should be straight forward and only require testing 2^8 or maybe 2^16 values. -Matt -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] [tgl@g3]$ echo -e "a\na a\naa\na a\nab\na b" | LC_ALL=en_US sort a aa a a a a ab a b There's no way to use an index ordered like this to look for strings beginning "a ", because the sorting of spaces depends on what comes after them. Making any real dent in the problem will probably require in-depth analysis of common locale (mis)behaviors. For example, if space sorting is the only thing that's funny about en_US, it might make sense for us to support a modified form of the LIKE optimization that doesn't consider space as a "safe" prefix character (ie, we could index only for "a" not "a ", even if the pattern is LIKE 'a %'). I have no idea exactly what sort of compromises would be effective though. Any localedef experts out there? regards, tom lane
pgsql-general by date: