Re: [GENERAL] trouble with to_char('L') - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [GENERAL] trouble with to_char('L')
Date
Msg-id 201003222014.o2MKErr17486@momjian.us
Whole thread Raw
In response to Re: [GENERAL] trouble with to_char('L')  (Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>)
Responses Re: [GENERAL] trouble with to_char('L')
List pgsql-hackers
Takahiro Itagaki wrote:
> 
> Bruce Momjian <bruce@momjian.us> wrote:
> 
> > Takahiro Itagaki wrote:
> > > Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
> > > db_encoding_strdup() with the function. Like this:
> > 
> > OK, I don't have any Win32 people testing this patch so if we want this
> > fixed for 9.0 someone is going to have to test my patch to see that it
> > works.  Can you make the adjustments suggested above to my patch and
> > test it to see that it works so we can apply it for 9.0?
> 
> Here is a full patch that can be applied cleanly to HEAD.
> Can anyone test it on Windows?
> 
> I'm not sure why temporary changes of lc_ctype was required in the
> original patch. The codes are not included in my patch, but please
> notice me it is still needed.

Sorry for the delay in replying to you.

I considered your idea of using the existing Postgres encoding
conversion routines to do the conversion of localenv() strings, but
found two problems.

First, GetPlatformEncoding() caches its result, so it assumes the
LC_CTYPE never changes for the server, while fixing this issue actually
requires us to change LC_CTYPE.  We could avoid the caching but that
then involves complex table lookups, etc, which seems overly complex:

+       /* convert the string to the database encoding */
+       pstr = (char *) pg_do_encoding_conversion(
+                                               (unsigned char *) str, strlen(str),
+                                               GetPlatformEncoding(), GetDatabaseEncoding());

Second, having our backend routines do the conversion seems wrong
because it is possible for someone to set LC_MONETARY to an encoding
that our database does not understand, e.g. UTF16, but one that WIN32
can convert to a valid encoding.

The reason we are doing all this is because of this updated comment in
my patch:
ftp://momjian.us/pub/postgresql/mypatches/pg_locale

+    *  Ideally, monetary and numeric local symbols could be returned in
+    *  any server encoding.  Unfortunately, the WIN32 API does not allow
+    *  setlocale() to return values in a codepage/CTYPE that uses more
+    *  than two bytes per character, like UTF-8:
+    *
+    *      http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
+    *
+    *  Evidently, LC_CTYPE allows us to control the encoding used
+    *  for strings returned by localeconv().  The Open Group
+    *  standard, mentioned at the top of this C file, doesn't
+    *  explicitly state this.
+    *
+    *  Therefore, we set LC_CTYPE to match LC_NUMERIC and
+    *  LC_MONETARY, call localeconv(), and use mbstowcs() to
+    *  convert the locale-aware string, e.g. Euro symbol (which
+    *  is not in UTF-8), to the server encoding.

One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally
on Win32 and then just convert that always to the server encoding with
win32_wchar_to_db_encoding(), instead of using the encoding from
LC_MONETARY to set LC_CTYPE and having to do double-conversion.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Comments on Exclusion Constraints and related datatypes
Next
From: Greg Stark
Date:
Subject: Re: [postgis-users] ERROR: array size exceeds themaximumallowed(134217727)