Home > mailing lists

Re: UPPER()/LOWER() and UTF-8 - Mailing list pgsql-hackers

From	Alexey Mahotkin
Subject	Re: UPPER()/LOWER() and UTF-8
Date	November 9, 2003 16:30:19
Msg-id	873cd2g8ae.fsf@dim.w-m.ru Whole thread Raw
In response to	Re: UPPER()/LOWER() and UTF-8 (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: UPPER()/LOWER() and UTF-8
List	pgsql-hackers

Tree view

>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:
   TL> writes: upper/lower aren't   TL> going to work desirably in any multi-byte character set   TL> encoding.
   >> Can you please point me at their implementation?  I do not   >> understand why that's impossible.
   TL> Because they use <ctype.h>'s toupper() and tolower()   TL> functions, which only work on single-byte
characters.

Aha, that's in src/backend/utils/adt/formatting.c, right?

Yes, I see, it goes byte by byte and uses toupper().  I believe we
could look at the locale, and if it is UTF-8, then use (or copy)
e.g. g_utf8_strup/strdown, right?
    http://developer.gnome.org/doc/API/2.0/glib/glib-Unicode-Manipulation.html#g-utf8-strup

I belive that patch could be written in a matter of hours.

   TL> There has been some discussion of using <wctype.h> where   TL> available, but this has a number of issues,
notablyfiguring   TL> out the correct mapping from the server string encoding (eg   TL> UTF-8) to unpacked wide
characters. At minimum we'd need to   TL> know which charset the locale setting is expecting, and there   TL> doesn't
seemto be a portable way to find that out.

   TL> IIRC, Peter thinks we must abandon use of libc's locale   TL> functionality altogether and write our own locale
layerbefore   TL> we can really have all the locale-specific functionality we   TL> want.

I believe that native Unicode strings (together with human language
handling) should be introduced as (almost) separate data type (which
have nothing to do with locale), but that's bluesky maybe.

--alexm

pgsql-hackers by date:

From: David Wheeler
Date: 09 November 2003, 16:29:56
Subject: Darwin Startup Script Patch

From: Manfred Spraul
Date: 09 November 2003, 16:31:00
Subject: Re: Experimental patch for inter-page delay in VACUUM

Re: UPPER()/LOWER() and UTF-8 - Mailing list pgsql-hackers

Previous

Next