Re: patch: utf8_to_unicode (trivial) - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: patch: utf8_to_unicode (trivial)
Date
Msg-id 1281719926-sup-5928@alvh.no-ip.org
Whole thread Raw
In response to Re: patch: utf8_to_unicode (trivial)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: patch: utf8_to_unicode (trivial)
List pgsql-hackers
Excerpts from Robert Haas's message of vie ago 13 12:50:13 -0400 2010:
> On Fri, Aug 13, 2010 at 12:11 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
> > src/include/port.h?
> 
> Oh, hey, look at that.  Any thought on what to about the fact that our
> two existing copies of utf2ucs() don't match?  (one tests against 0xf8
> where the other against 0xf0)

I'm not sure why it's masking 0xf8 instead of 0xf0.  It seems like c &
0xf8 == 0xf8 signals start of a 5-byte sequence which is not valid per
RFC 3629, according to wikipedia:
http://en.wikipedia.org/wiki/UTF-8#Description

(Moreover, 0xf5 to 0xf7 signal start of a 4-byte sequence for codepoints
that apparently are not supposed to be valid).

So apparently it's good that the code returns an invalid code in those
cases, i.e. wchar.c is right and mbprint is wrong.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: more numeric stuff
Next
From: Tom Lane
Date:
Subject: Re: patch: utf8_to_unicode (trivial)