Home > mailing lists

Re: [v9.2] make_greater_string() does not return a string in some cases - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: [v9.2] make_greater_string() does not return a string in some cases
Date	October 30, 2011 14:59:04
Msg-id	21986.1319986733@sss.pgh.pa.us Whole thread Raw
In response to	Re: [v9.2] make_greater_string() does not return a string in some cases (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: [v9.2] make_greater_string() does not return a string in some cases (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Oct 29, 2011 at 4:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Oh!  You are right, I was expecting it to try multiple characters at the
>> same position before truncating the string.  This change seems to have
>> lobotomized things rather thoroughly.  What is the rationale for that?
>> As an example, when dealing with a single-character string, it will fail
>> altogether if the next code value sorts out-of-order, so this seems to
>> me to be a rather large step backwards.

> On this point I believe you are still confused.  The old code tried
> one character per position, and the new code tries one character per
> position.  Nothing has been lobotomized in any way.

No, on this point you are flat out wrong.  Try something like
select ... where f1 like 'p%';

in tt_RU locale, wherein 'q' sorts between 'k' and 'l'.  The old code
correctly found that 'r' works as a string greater than 'p'.  The new
code fails to find a greater string, because it only tries 'q' and then
gives up.  This results in a selectivity estimate much poorer than
necessary.

Since the stated purpose of this patch is to fix some corner cases
where the code fails to find a greater string, I fail to see why it's
acceptable to introduce some other corner cases that weren't there
before.

> The difference is
> that the old code used a "guess and check" approach to generate the
> character, so there was an inner loop that was trying to generate a
> character (possibly generating various garbage strings that did not
> represent a character along the way) and then, upon success, checked
> the sort order of that single string before truncating and retrying.

You are misreading the old code.  The inner loop increments the last
byte, checks for success, and if it hasn't produced a greater string
then it loops around to increment again.

> The fact that we haven't gotten any complaints before suggests that this
> actually works decently well as it stands.

Well, that's true of the old algorithm ;-)

I had likewise thought of the idea of trying some fixed number of
character values at each position, but it's unclear to me why that's
better than allowing an encoding-specific behavior.  I don't believe
that we could get away with trying less than a few dozen values, though.
For example, in a situation where case sensitivity is relevant, you
might need to increment past all the upper-case letters to get to a
suitable lower-case letter.  I also think that it's probably useful to
try incrementing higher-order bytes of a multibyte character before
giving up --- we just can't afford to do an exhaustive search.
Thus my proposal to let the low-order bytes max out but not cycle.
        regards, tom lane

pgsql-hackers by date:

From: Martijn van Oosterhout
Date: 30 October 2011, 14:27:19
Subject: Re: Add socket dir to pg_config..?

From: Eric Ridge
Date: 30 October 2011, 18:00:52
Subject: Re: Thoughts on "SELECT * EXCLUDING (...) FROM ..."?

Re: [v9.2] make_greater_string() does not return a string in some cases - Mailing list pgsql-hackers

Previous

Next