Re: Unicode problems on IRC

From: Andrew - Supernews
Subject: Re: Unicode problems on IRC
Date: ,
Msg-id: slrnd5jsg1.2ilg.andrew+nonews@trinity.supernews.net
(view: Whole thread, Raw)
In response to: Re: Unicode problems on IRC  ("John Hansen")
List: pgsql-hackers

Tree view

Unicode problems on IRC  (Christopher Kings-Lynne, )
 Re: Unicode problems on IRC  (Bruce Momjian, )
 Re: Unicode problems on IRC  (Andrew - Supernews, )
 Re: Unicode problems on IRC  ("John Hansen", )
  Re: Unicode problems on IRC  (Tom Lane, )
   Re: Unicode problems on IRC  (Bruce Momjian, )
  Re: Unicode problems on IRC  (Andrew - Supernews, )
   Re: Unicode problems on IRC  (Tom Lane, )
    Re: Unicode problems on IRC  (Oliver Jowett, )
  Re: Unicode problems on IRC  (Andrew - Supernews, )
 Re: Unicode problems on IRC  ("John Hansen", )
  Re: Unicode problems on IRC  (Andrew - Supernews, )

On 2005-04-10, "John Hansen" <> wrote:
> That's right, dono how I missed that one, but looks correct to me, and
> is in line with the code in ConvertUTF.c from unicode.org, on which I
> based the patch, extended to support 6 byte utf8 characters.

Frankly, you should probably de-extend it back down to 4 bytes. That's
enough to encode the Unicode range of 0x000000 - 0x10FFFF, and enough
other stuff would break if anyone allocated a character outside that
range that I don't think it it worth worrying about. (Even the ISO
people have agreed to conform to that limitation.) Even if insanity
struck simultaneously at both standards bodies, 4 bytes is enough to
go to 0x1FFFFF so there is still substantial slack. (A number of other
specifications based on utf-8 have removed the 5 and 6 byte sequences
too, so there is substantial precedent for this.)

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services



pgsql-hackers by date:

From: Neil Conway
Date:
Subject: Re: Question regarding clock-sweep
From: Tom Lane
Date:
Subject: Re: Question regarding clock-sweep