Re: [v9.2] make_greater_string() does not return a string in some cases - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [v9.2] make_greater_string() does not return a string in some cases
Date
Msg-id 20111021.103646.221883029.horiguchi.kyotaro@oss.ntt.co.jp
Whole thread Raw
In response to Re: [v9.2] make_greater_string() does not return a string in some cases  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [v9.2] make_greater_string() does not return a string in some cases
List pgsql-hackers
Hello,

> > Robert Haas <robertmhaas@gmail.com> writes:
> >> - Why does the second byte need special handling for 0xED and 0xF4?
> >
> > http://www.faqs.org/rfcs/rfc3629.html
> >
> > See section 4 in particular.  The underlying requirement is to disallow
> > multiple representations of the same Unicode code point.
The special handling skips the utf8 code regions corresponds to
the regions U+D800 - U+DFFF and U+110000 - U+11ffff in ucs-4. The
former is reserved for use with the UTF-16 encoding form as
surrougate pairs and do not directly represent characters as
described in section 3 of rfc3629. The latter is the region which
is out of the utf-8 range by the definition described also in the
same section.

former> The definition of UTF-8 prohibits encoding character
former> numbers between U+D800 and U+DFFF, which are reserved for
former> use with the UTF-16 encoding form (as surrogate pairs)
former> and do not directly represent characters.

latter> In UTF-8, characters from the U+0000..U+10FFFF range (the
latter> UTF-16 accessible range) are encoded using sequences of 1
latter> to 4 octets.

# However, I wrote this exception simplly mimicked the
# pg_utf8_validator()'s behavior at the beginning.


This must be the basis of the behavior of pg_utf8_verifier(), and
pg_utf8_increment() has taken over it. It may be good to describe
this origin of the special handling as comment of these functions
to avoid this sort of confusion.


> I'm still confused.  The input string is already known to be valid
> UTF-8, so the second byte (if there is one) must be between 0x80 and
> 0xBF.  Therefore it will be neither 0xED nor 0xF4.

--
Kyotaro Horiguchi
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: ProcessStandbyHSFeedbackMessage can make global xmin go backwards
Next
From: Fujii Masao
Date:
Subject: Re: loss of transactions in streaming replication