Re: Enforcing database encoding and locale match - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Enforcing database encoding and locale match
Date
Msg-id 21237.1191020333@sss.pgh.pa.us
Whole thread Raw
In response to Re: Enforcing database encoding and locale match  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
Responses Re: Enforcing database encoding and locale match
List pgsql-hackers
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> On Solaris I got following problematic locales: [...]

I tried this program on Mac OS X 10.4.10 (the current release) and found
out that what that OS mostly returns is the encoding portion of the
locale name, for instance

sv_SE.ISO8859-15        ... ISO8859-15 - OK
sv_SE.UTF-8             ... UTF-8      - OK
tr_TR                   ...            - NO MATCH
tr_TR.ISO8859-9         ... ISO8859-9  - OK
tr_TR.UTF-8             ... UTF-8      - OK
uk_UA                   ...            - NO MATCH
uk_UA.ISO8859-5         ... ISO8859-5  - OK
uk_UA.KOI8-U            ... KOI8-U     - NO MATCH
uk_UA.UTF-8             ... UTF-8      - OK
zh_CN                   ...            - NO MATCH
zh_CN.eucCN             ... eucCN      - OK
zh_CN.GB18030           ... GB18030    - NO MATCH
zh_CN.GB2312            ... GB2312     - OK
zh_CN.GBK               ... GBK        - NO MATCH
zh_CN.UTF-8             ... UTF-8      - OK
zh_HK                   ...            - NO MATCH
zh_HK.Big5HKSCS         ... Big5HKSCS  - NO MATCH
zh_HK.UTF-8             ... UTF-8      - OK
zh_TW                   ...            - NO MATCH
zh_TW.Big5              ... Big5       - NO MATCH
zh_TW.UTF-8             ... UTF-8      - OK
C                       ... US-ASCII   - NO MATCH
POSIX                   ... US-ASCII   - NO MATCH

They didn't *quite* hard-wire it that way, as evidenced by the C/POSIX
results, but certainly the empty-string results are entirely useless.
Perhaps we should file a bug with Apple.  However, some poking around
in /usr/share/locale indicates that there's a consistent interpretation
to be made:

g42:/usr/share/locale tgl$ ls -l ??_??/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 af_ZA/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
-r--r--r--   1 root  wheel  3272 Mar 20  2005 am_ET/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 be_BY/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 bg_BG/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 ca_ES/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 cs_CZ/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 da_DK/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 de_AT/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 de_CH/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 de_DE/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 el_GR/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 en_AU/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 en_CA/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
(etc etc)

The only one that's not actually a symlink to the standard UTF-8 ctype
is am_ET/LC_CTYPE, which is identical to am_ET.UTF-8/LC_CTYPE.
So I think we can get away with something like

#ifdef __darwin__if (strlen(sys) == 0)    // assume UTF8
#endif

I suppose we'll need a few more hacks like this as the beta-test results
begin to roll in ...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [PATCHES] Add function for quote_qualified_identifier?
Next
From: Tom Lane
Date:
Subject: Re: Enforcing database encoding and locale match