Re: Enforcing database encoding and locale match - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Enforcing database encoding and locale match |
Date | |
Msg-id | 21237.1191020333@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Enforcing database encoding and locale match (Zdenek Kotala <Zdenek.Kotala@Sun.COM>) |
Responses |
Re: Enforcing database encoding and locale match
|
List | pgsql-hackers |
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > On Solaris I got following problematic locales: [...] I tried this program on Mac OS X 10.4.10 (the current release) and found out that what that OS mostly returns is the encoding portion of the locale name, for instance sv_SE.ISO8859-15 ... ISO8859-15 - OK sv_SE.UTF-8 ... UTF-8 - OK tr_TR ... - NO MATCH tr_TR.ISO8859-9 ... ISO8859-9 - OK tr_TR.UTF-8 ... UTF-8 - OK uk_UA ... - NO MATCH uk_UA.ISO8859-5 ... ISO8859-5 - OK uk_UA.KOI8-U ... KOI8-U - NO MATCH uk_UA.UTF-8 ... UTF-8 - OK zh_CN ... - NO MATCH zh_CN.eucCN ... eucCN - OK zh_CN.GB18030 ... GB18030 - NO MATCH zh_CN.GB2312 ... GB2312 - OK zh_CN.GBK ... GBK - NO MATCH zh_CN.UTF-8 ... UTF-8 - OK zh_HK ... - NO MATCH zh_HK.Big5HKSCS ... Big5HKSCS - NO MATCH zh_HK.UTF-8 ... UTF-8 - OK zh_TW ... - NO MATCH zh_TW.Big5 ... Big5 - NO MATCH zh_TW.UTF-8 ... UTF-8 - OK C ... US-ASCII - NO MATCH POSIX ... US-ASCII - NO MATCH They didn't *quite* hard-wire it that way, as evidenced by the C/POSIX results, but certainly the empty-string results are entirely useless. Perhaps we should file a bug with Apple. However, some poking around in /usr/share/locale indicates that there's a consistent interpretation to be made: g42:/usr/share/locale tgl$ ls -l ??_??/LC_CTYPE lrwxr-xr-x 1 root wheel 17 Apr 26 2006 af_ZA/LC_CTYPE@ -> ../UTF-8/LC_CTYPE -r--r--r-- 1 root wheel 3272 Mar 20 2005 am_ET/LC_CTYPE lrwxr-xr-x 1 root wheel 17 Apr 26 2006 be_BY/LC_CTYPE@ -> ../UTF-8/LC_CTYPE lrwxr-xr-x 1 root wheel 17 Apr 26 2006 bg_BG/LC_CTYPE@ -> ../UTF-8/LC_CTYPE lrwxr-xr-x 1 root wheel 17 Apr 26 2006 ca_ES/LC_CTYPE@ -> ../UTF-8/LC_CTYPE lrwxr-xr-x 1 root wheel 17 Apr 26 2006 cs_CZ/LC_CTYPE@ -> ../UTF-8/LC_CTYPE lrwxr-xr-x 1 root wheel 17 Apr 26 2006 da_DK/LC_CTYPE@ -> ../UTF-8/LC_CTYPE lrwxr-xr-x 1 root wheel 17 Apr 26 2006 de_AT/LC_CTYPE@ -> ../UTF-8/LC_CTYPE lrwxr-xr-x 1 root wheel 17 Apr 26 2006 de_CH/LC_CTYPE@ -> ../UTF-8/LC_CTYPE lrwxr-xr-x 1 root wheel 17 Apr 26 2006 de_DE/LC_CTYPE@ -> ../UTF-8/LC_CTYPE lrwxr-xr-x 1 root wheel 17 Apr 26 2006 el_GR/LC_CTYPE@ -> ../UTF-8/LC_CTYPE lrwxr-xr-x 1 root wheel 17 Apr 26 2006 en_AU/LC_CTYPE@ -> ../UTF-8/LC_CTYPE lrwxr-xr-x 1 root wheel 17 Apr 26 2006 en_CA/LC_CTYPE@ -> ../UTF-8/LC_CTYPE (etc etc) The only one that's not actually a symlink to the standard UTF-8 ctype is am_ET/LC_CTYPE, which is identical to am_ET.UTF-8/LC_CTYPE. So I think we can get away with something like #ifdef __darwin__if (strlen(sys) == 0) // assume UTF8 #endif I suppose we'll need a few more hacks like this as the beta-test results begin to roll in ... regards, tom lane
pgsql-hackers by date: