Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Sep 22, 2011 at 10:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Anyway, I won't stand in the way of the patch as long as it's modified
>> to limit the number of values considered for any one character position
>> to something reasonably small.
> I think that limit in both the old and new code is 1, except that the
> new code does it more efficiently.
> Am I confused?
Yes, or else I am. Consider a 4-byte UTF8 character at the end of the
string. The existing code increments the last byte up to 255 (rejecting
everything past 0xBF), then gives up and truncates that character away.
So the maximum number of tries for that character position is between 0
and 127 depending on what the original character was (with at most 63 of
the incremented values getting past the verifymbstr test).
The proposed patch is going to iterate through all Unicode code points
up to U+7FFFFF before giving up. Since it's possible that we need to
increment something further left to succeed at all, this doesn't seem
like a good plan.
regards, tom lane