Home > mailing lists

Re: verifying unicode locale support - Mailing list pgsql-general

From	Karel Zak
Subject	Re: verifying unicode locale support
Date	April 14, 2004 05:34:22
Msg-id	20040414083420.GB26417@zf.jcu.cz Whole thread Raw
In response to	Re: verifying unicode locale support (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-general

Tree view

On Tue, Apr 13, 2004 at 12:32:17PM -0400, Tom Lane wrote:
> Holger Klawitter <lists@klawitter.de> writes:
> > In order to avoid interaction with gcc, cat and others else I've written a
> > new program, reading from a file.
>
> After setting up the test case and duplicating your problem, I realized
> I was being dense :-( ... this is a well-known issue.  Need more
> caffeine before answering bug reports obviously ...
>
> The problem is that PG's upper() and lower() functions are based on
> the C library's <ctype.h> functions (toupper() and tolower()), which of
> course only work for single-byte character sets.  So they cannot work on
> UTF8 data.
>
> There has been some talk of rewriting these functions to use the
> <wctype.h> API where available, but no one's actually stepped up to the
> plate and done it.  IIRC the main sticking point was figuring out how to
> get from whatever character encoding the database is using into the wide
> character set representation the C library wants.  There doesn't seem to
> be a portable way of discovering exactly what the wchar encoding is
> supposed to be for the current locale setting.

 There  is  the  "libcharset  - portable  character  set  determination.
 library". But maintain  this library with  a lot  of OS depend  code is
 probably nothing simple. It's used in standard iconv.

 http://www.haible.de/bruno/packages-libcharset.html

 But  I'm  not sure  if  it  resolve  something,  because there  is  not
 gaurantee  of any  connection between  the current  locale setting  and
 string encoding.

     SELECT upper( convert('foo', 'X', 'Y') );

 IMHO solution  is add  to "struct varlena"  pointer to  pg_encname that
 knows handle  PostgreSQL encoding information and  make each PostgreSQL
 string  independent and  self-described. Or is  there something  why is
 this useless?

    Karel

--
 Karel Zak  <zakkr@zf.jcu.cz>
 http://home.zf.jcu.cz/~zakkr/

pgsql-general by date:

From: Rajesh Kumar Mallah
Date: 14 April 2004, 04:25:51
Subject: allowing vacuum/ analyze to operate on whole schemas.

From: Richard Huxton
Date: 14 April 2004, 06:18:50
Subject: Re: performance problem aftrer update from 7.1 to 7.4.2

Re: verifying unicode locale support - Mailing list pgsql-general

Previous

Next