Home > mailing lists

Re: UNICODE characters above 0x10000 - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: UNICODE characters above 0x10000
Date	August 7, 2004 06:51:17
Msg-id	27050.1091861346@sss.pgh.pa.us Whole thread Raw
In response to	Re: UNICODE characters above 0x10000 (Dennis Bjorklund <db@zigo.dhs.org>)
Responses	Re: UNICODE characters above 0x10000 Re: [PATCHES] UNICODE characters above 0x10000
List	pgsql-hackers

Tree view

Dennis Bjorklund <db@zigo.dhs.org> writes:
> ... This also means that the start byte can never start with 7 or 8
> ones, that is illegal and should be tested for and rejected. So the
> longest utf-8 sequence is 6 bytes (and the longest character needs 4
> bytes (or 31 bits)).

Tatsuo would know more about this than me, but it looks from here like
our coding was originally designed to support only 16-bit-wide internal
characters (ie, 16-bit pg_wchar datatype width).  I believe that the
regex library limitation here is gone, and that as far as that library
is concerned we could assume a 32-bit internal character width.  The
question at hand is whether we can support 32-bit characters or not ---
and if not, what's the next bug to fix?

            regards, tom lane

pgsql-hackers by date:

From: "John Hansen"
Date: 07 August 2004, 06:40:50
Subject: Re: UNICODE characters above 0x10000

From: Dennis Bjorklund
Date: 07 August 2004, 07:02:37
Subject: Re: UNICODE characters above 0x10000

Re: UNICODE characters above 0x10000 - Mailing list pgsql-hackers

Previous

Next