Re: Locale + encoding combinations - Mailing list pgsql-hackers

From Trevor Talbot
Subject Re: Locale + encoding combinations
Date
Msg-id 90bce5730710120603t1d10b20ld689ef41b201026b@mail.gmail.com
Whole thread Raw
In response to Re: Locale + encoding combinations  (Dave Page <dpage@postgresql.org>)
Responses Re: Locale + encoding combinations
Re: Locale + encoding combinations
List pgsql-hackers
On 10/12/07, Dave Page <dpage@postgresql.org> wrote:
> Tom Lane wrote
> > That still leaves us with the problem of how to tell whether a locale
> > spec is bad on Windows.  Judging by your example, Windows checks whether
> > the code page is present but not whether it is sane for the base locale.
> > What happens when there's a mismatch --- eg, what encoding do system
> > messages come out in?
>
> I'm not sure how to test that specifically, but it seems that accented
> characters simply fall back to their undecorated equivalents if the
> encoding is not appropriate, eg:
>
> Dave@SNAKE:~$ ./setlc French_France.1252
> Locale: French_France.1252
> The date is: sam. 01 of août  2007
> Dave@SNAKE:~$ ./setlc French_France.28597
> Locale: French_France.28597
> The date is: sam. 01 of aout  2007
>
> (the encodings used there are WIN1252 and ISO8859-7 (Greek)).
>
> I'm happy to test further is you can suggest how I can figure out the
> encoding actually output.

The encoding output is the one you specified.  Keep in mind,
underneath Windows is mostly working with Unicode, so all characters
exist and the locale rules specify their behavior there.  The encoding
is just the byte stream it needs to force them all into after doing
whatever it does to them.  As you've seen, it uses some sort of
best-fit mapping I don't know the details of.  (It will drop accent
marks and choose characters with similar shape where possible, by
default.)

I think it's a bit more complex for input/transform cases where you
operate on the byte stream directly without intermediate conversion to
Unicode, which is why UTF-8 doesn't work as a codepage, but again I
don't have the details nearby.  I can try to do more digging if
needed.


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Locales and Encodings
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Locales and Encodings