Re: Bug with UTF-8 character - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Bug with UTF-8 character
Date
Msg-id 25791.1148654039@sss.pgh.pa.us
Whole thread Raw
In response to Bug with UTF-8 character  (Hans-Jürgen Schönig <postgres@cybertec.at>)
List pgsql-hackers
Hans-Jürgen Schönig <postgres@cybertec.at> writes:
> But the code does a check where the second character should not be 
> greater than 0x9F, when first character is 0xED. This is not according 
> to UTF-8 standard in RFC 3629.

Better read the RFC again: it says
  UTF8-3      = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /                %xED %x80-9F UTF8-tail / %xEE-EF 2(
UTF8-tail)                ------------
 

The reason for the prohibition is explained as
 The definition of UTF-8 prohibits encoding character numbers between U+D800 and U+DFFF, which are reserved for use
withthe UTF-16 encoding form (as surrogate pairs) and do not directly represent characters.
 

I don't know anything about "surrogate pairs", but I am not about to
decide that we know more about this than the RFC authors do.  If they
say it's invalid, it's invalid.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Andreas Pflug
Date:
Subject: Re: XLogArchivingActive
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Bug with UTF-8 character