Home > mailing lists

Re: [HACKERS] UNICODE characters above 0x10000 - Mailing list pgsql-patches

From	Oliver Elphick
Subject	Re: [HACKERS] UNICODE characters above 0x10000
Date	August 7, 2004 03:41:27
Msg-id	1091858144.13140.14.camel@linda Whole thread
In response to	Re: [HACKERS] UNICODE characters above 0x10000 (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-patches

Tree view

On Sat, 2004-08-07 at 06:06, Tom Lane wrote:
> Now it's entirely possible that the underlying support is a few bricks
> shy of a load --- for instance I see that pg_utf_mblen thinks there are
> no UTF8 codes longer than 3 bytes whereas your code goes to 4.  I'm not
> an expert on this stuff, so I don't know what the UTF8 spec actually
> says.  But I do think you are fixing the code at the wrong level.

UTF-8 characters can be up to 6 bytes long:
http://www.cl.cam.ac.uk/~mgk25/unicode.html

glibc provides various routines (mb...) for handling Unicode.  How many
of our supported platforms don't have these?  If there are still some
that don't, wouldn't it be better to use the standard routines where
they do exist?

--
Oliver Elphick                                          olly@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA  92C8 39E7 280E 3631 3F0E  1EC0 5664 7A2F A543 10EA
                 ========================================
     "Be still before the LORD and wait patiently for him;
      do not fret when men succeed in their ways, when they
      carry out their wicked schemes."
                            Psalms 37:7

pgsql-patches by date:

From: Tom Lane
Date: 07 August 2004, 03:40:59
Subject: Re: Minor BEFORE DELETE trigger fix

From: "John Hansen"
Date: 07 August 2004, 03:44:23
Subject: Re: [HACKERS] UNICODE characters above 0x10000

Re: [HACKERS] UNICODE characters above 0x10000 - Mailing list pgsql-patches

Previous

Next