Re: [v9.2] make_greater_string() does not return a string in some cases - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [v9.2] make_greater_string() does not return a string in some cases
Date
Msg-id 21348.1316706403@sss.pgh.pa.us
Whole thread Raw
In response to Re: [v9.2] make_greater_string() does not return a string in some cases  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [v9.2] make_greater_string() does not return a string in some cases
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> One thing I was thinking about is that it would be useful to have some
> metric for judging how well any given algorithm that we might pick
> here actually works.

Well, the metric that we were indirectly using earlier was the
number of characters in a given locale for which the algorithm
fails to find a greater one (excluding whichever character is "last",
I guess, or you could just recognize there's always at least one).

> For example, if we were to try all possible
> three character strings in some encoding and run make_greater_string()
> on each one of them, we could then measure the failure percentage.  Or
> if that's too many cases to crank through then we could limit it some
> way -

Even in UTF8 there's only a couple million assigned code points, so for
test purposes anyway it doesn't seem like we couldn't crank through them
all.  Also, in many cases you could probably figure it out by analysis
instead of brute-force testing every case.

A more reasonable objection might be that a whole lot of those code
points are things nobody cares about, and so we need to weight the
results somehow by the actual popularity of the character.  Not sure
how to take that into account.

Another issue here is that we need to consider not just whether we find
a greater character, but "how much greater" it is.  This would apply to
my suggestion of incrementing the top byte without considering
lower-order bytes --- we'd be skipping quite a lot of code space for
each increment, and it's conceivable that that would be quite hurtful in
some cases.  Not sure how to account for that either.  An extreme
example here is an "incrementer" that just immediately returns the last
character in the sort order for any lesser input.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: memory barriers (was: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs)
Next
From: Euler Taveira de Oliveira
Date:
Subject: Re: unaccent contrib