Home > mailing lists

Re: patch: utf8_to_unicode (trivial) - Mailing list pgsql-hackers

From	Alvaro Herrera
Subject	Re: patch: utf8_to_unicode (trivial)
Date	August 13, 2010 14:40:32
Msg-id	1281719926-sup-5928@alvh.no-ip.org Whole thread Raw
In response to	Re: patch: utf8_to_unicode (trivial) (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: patch: utf8_to_unicode (trivial)
List	pgsql-hackers

Tree view

Excerpts from Robert Haas's message of vie ago 13 12:50:13 -0400 2010:
> On Fri, Aug 13, 2010 at 12:11 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
> > src/include/port.h?
> 
> Oh, hey, look at that.  Any thought on what to about the fact that our
> two existing copies of utf2ucs() don't match?  (one tests against 0xf8
> where the other against 0xf0)

I'm not sure why it's masking 0xf8 instead of 0xf0.  It seems like c &
0xf8 == 0xf8 signals start of a 5-byte sequence which is not valid per
RFC 3629, according to wikipedia:
http://en.wikipedia.org/wiki/UTF-8#Description

(Moreover, 0xf5 to 0xf7 signal start of a 4-byte sequence for codepoints
that apparently are not supposed to be valid).

So apparently it's good that the code returns an invalid code in those
cases, i.e. wchar.c is right and mbprint is wrong.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

pgsql-hackers by date:

From: Robert Haas
Date: 13 August 2010, 14:35:13
Subject: Re: more numeric stuff

From: Tom Lane
Date: 13 August 2010, 14:50:44
Subject: Re: patch: utf8_to_unicode (trivial)

Re: patch: utf8_to_unicode (trivial) - Mailing list pgsql-hackers

Previous

Next