Re: [PATCHES] UNICODE characters above 0x10000 - Mailing list pgsql-hackers

From John Hansen
Subject Re: [PATCHES] UNICODE characters above 0x10000
Date
Msg-id 5066E5A966339E42AA04BA10BA706AE56173@rodrick.geeknet.com.au
Whole thread Raw
List pgsql-hackers
> -----Original Message-----
> From: Dennis Bjorklund [mailto:db@zigo.dhs.org]
> Sent: Saturday, August 07, 2004 11:23 PM
> To: John Hansen
> Cc: Takehiko Abe; pgsql-hackers@postgresql.org
> Subject: RE: [PATCHES] [HACKERS] UNICODE characters above 0x10000
>
> On Sat, 7 Aug 2004, John Hansen wrote:
>
> > Now, is it really 24 bits tho?
> > Afaict, it's really 21 (0 - 10FFFF or 0 - xxx10000 11111111
> 11111111)
>
> Yes, up to 0x10ffff should be enough.
>
> The 24 is not really important, this is all about what utf-8
> strings to accept as input. The strings are stored as utf-8
> strings and when processed inside pg it uses wchar_t that is
> 32 bit (on some systems at least). By restricting the utf-8
> input to unicode we can in the future store each character as
> 3 bytes if we want.

Which brings us back to something like the attached...

>
> --
> /Dennis Björklund
>
>
>

Regards,

John Hansen

Attachment

pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: Vacuum Cost Documentation?
Next
From: Bernd Helmle
Date:
Subject: Backend crashes with notification rule