Home > mailing lists

Re: [HACKERS] UNICODE characters above 0x10000 - Mailing list pgsql-patches

From	John Hansen
Subject	Re: [HACKERS] UNICODE characters above 0x10000
Date	August 7, 2004 10:11:47
Msg-id	5066E5A966339E42AA04BA10BA706AE5608A@rodrick.geeknet.com.au Whole thread Raw
Responses	Re: [HACKERS] UNICODE characters above 0x10000 Re: [HACKERS] UNICODE characters above 0x10000
List	pgsql-patches

Tree view

Yes, but the specification allows for 6byte sequences, or 32bit
characters.
As dennis pointed out, just because they're not used, doesn't mean we
should not allow them to be stored, since there might me someone using
the high ranges for a private character set, which could very well be
included in the specification some day.

Regards,

John Hansen

-----Original Message-----
From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
Sent: Saturday, August 07, 2004 8:09 PM
To: tgl@sss.pgh.pa.us
Cc: db@zigo.dhs.org; John Hansen; pgsql-hackers@postgresql.org;
pgsql-patches@postgresql.org
Subject: Re: [PATCHES] [HACKERS] UNICODE characters above 0x10000

> Dennis Bjorklund <db@zigo.dhs.org> writes:
> > ... This also means that the start byte can never start with 7 or 8
> > ones, that is illegal and should be tested for and rejected. So the
> > longest utf-8 sequence is 6 bytes (and the longest character needs 4

> > bytes (or 31 bits)).
>
> Tatsuo would know more about this than me, but it looks from here like

> our coding was originally designed to support only 16-bit-wide
> internal characters (ie, 16-bit pg_wchar datatype width).  I believe
> that the regex library limitation here is gone, and that as far as
> that library is concerned we could assume a 32-bit internal character
> width.  The question at hand is whether we can support 32-bit
> characters or not --- and if not, what's the next bug to fix?

pg_wchar has been already 32-bit datatype.  However I doubt there's
actually a need for 32-but width character sets. Even Unicode only uese
up 0x0010FFFF, so 24-bit should be enough...
--
Tatsuo Ishii

pgsql-patches by date:

From: Tatsuo Ishii
Date: 07 August 2004, 10:07:08
Subject: Re: [HACKERS] UNICODE characters above 0x10000

From: Tatsuo Ishii
Date: 07 August 2004, 10:44:25
Subject: Re: [HACKERS] UNICODE characters above 0x10000

Re: [HACKERS] UNICODE characters above 0x10000 - Mailing list pgsql-patches

Previous

Next