Re: [v9.2] make_greater_string() does not return a string in some cases - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [v9.2] make_greater_string() does not return a string in some cases |
Date | |
Msg-id | 21986.1319986733@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [v9.2] make_greater_string() does not return a string in some cases (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [v9.2] make_greater_string() does not return a string
in some cases
(Robert Haas <robertmhaas@gmail.com>)
|
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Oct 29, 2011 at 4:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Oh! You are right, I was expecting it to try multiple characters at the >> same position before truncating the string. This change seems to have >> lobotomized things rather thoroughly. What is the rationale for that? >> As an example, when dealing with a single-character string, it will fail >> altogether if the next code value sorts out-of-order, so this seems to >> me to be a rather large step backwards. > On this point I believe you are still confused. The old code tried > one character per position, and the new code tries one character per > position. Nothing has been lobotomized in any way. No, on this point you are flat out wrong. Try something like select ... where f1 like 'p%'; in tt_RU locale, wherein 'q' sorts between 'k' and 'l'. The old code correctly found that 'r' works as a string greater than 'p'. The new code fails to find a greater string, because it only tries 'q' and then gives up. This results in a selectivity estimate much poorer than necessary. Since the stated purpose of this patch is to fix some corner cases where the code fails to find a greater string, I fail to see why it's acceptable to introduce some other corner cases that weren't there before. > The difference is > that the old code used a "guess and check" approach to generate the > character, so there was an inner loop that was trying to generate a > character (possibly generating various garbage strings that did not > represent a character along the way) and then, upon success, checked > the sort order of that single string before truncating and retrying. You are misreading the old code. The inner loop increments the last byte, checks for success, and if it hasn't produced a greater string then it loops around to increment again. > The fact that we haven't gotten any complaints before suggests that this > actually works decently well as it stands. Well, that's true of the old algorithm ;-) I had likewise thought of the idea of trying some fixed number of character values at each position, but it's unclear to me why that's better than allowing an encoding-specific behavior. I don't believe that we could get away with trying less than a few dozen values, though. For example, in a situation where case sensitivity is relevant, you might need to increment past all the upper-case letters to get to a suitable lower-case letter. I also think that it's probably useful to try incrementing higher-order bytes of a multibyte character before giving up --- we just can't afford to do an exhaustive search. Thus my proposal to let the low-order bytes max out but not cycle. regards, tom lane
pgsql-hackers by date: