Re: Encoding and i18n - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Encoding and i18n
Date
Msg-id 27167.1191695068@sss.pgh.pa.us
Whole thread Raw
In response to Re: Encoding and i18n  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: Encoding and i18n
Re: Encoding and i18n
List pgsql-hackers
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I tried on both a UTF8 and Latin1 terminal and it works OK in all cases.

The cases that would be interesting involve to_char's locale-specific
format codes (eg Dy) along with LC_TIME settings that are deliberately
incompatible with the database encoding.  client_encoding is not relevant.

It's not real clear to me whether, on a Unix machine, there is even
supposed to be any difference between setting LC_TIME=es_ES.iso88591 and
setting it to es_ES.utf8.  Since nl_langinfo(CODESET) is supposedly
determined only by LC_CTYPE, you could argue that strftime's results
should be in that encoding regardless, and that the codeset component of
other LC_ variables should be ignored.  Some experimentation suggests
that at least in glibc it doesn't work that way, and that there is in
fact no principled way for you to find out what encoding strftime is
giving you :-(.

$ LANG=es_ES.utf8 date
sáb oct  6 14:11:30 EDT 2007
$ LANG=es_ES.iso88591 date
s�b oct  6 14:11:42 EDT 2007
$ LANG=en_US.iso88591 LC_TIME=es_ES.utf8 date
sáb oct  6 14:12:10 EDT 2007
$ LC_CTYPE=en_US.iso88591 LC_TIME=es_ES.utf8 date
sáb oct  6 14:12:34 EDT 2007

Perhaps a workable fix for this would be to try to mangle the LC_ settings
we pass to setlocale() so that they all have the same codeset component
(if any).  It looks like the convention of ".foo" being a codeset name
is fairly well standardized, even if the spelling of the codeset name is
not ...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: Polymorphic arguments and composite types
Next
From: Gregory Stark
Date:
Subject: Re: Encoding and i18n