On Fri, Oct 12, 2007 at 06:03:52AM -0700, Trevor Talbot wrote:
> On 10/12/07, Dave Page <dpage@postgresql.org> wrote:
> > Tom Lane wrote
> > > That still leaves us with the problem of how to tell whether a locale
> > > spec is bad on Windows. Judging by your example, Windows checks whether
> > > the code page is present but not whether it is sane for the base locale.
> > > What happens when there's a mismatch --- eg, what encoding do system
> > > messages come out in?
> >
> > I'm not sure how to test that specifically, but it seems that accented
> > characters simply fall back to their undecorated equivalents if the
> > encoding is not appropriate, eg:
> >
> > Dave@SNAKE:~$ ./setlc French_France.1252
> > Locale: French_France.1252
> > The date is: sam. 01 of août 2007
> > Dave@SNAKE:~$ ./setlc French_France.28597
> > Locale: French_France.28597
> > The date is: sam. 01 of aout 2007
> >
> > (the encodings used there are WIN1252 and ISO8859-7 (Greek)).
> >
> > I'm happy to test further is you can suggest how I can figure out the
> > encoding actually output.
>
> The encoding output is the one you specified. Keep in mind,
> underneath Windows is mostly working with Unicode, so all characters
> exist and the locale rules specify their behavior there. The encoding
> is just the byte stream it needs to force them all into after doing
> whatever it does to them. As you've seen, it uses some sort of
> best-fit mapping I don't know the details of. (It will drop accent
> marks and choose characters with similar shape where possible, by
> default.)
>
> I think it's a bit more complex for input/transform cases where you
> operate on the byte stream directly without intermediate conversion to
> Unicode, which is why UTF-8 doesn't work as a codepage, but again I
> don't have the details nearby. I can try to do more digging if
> needed.
Just so the non-windows-savvy people get it.. When Windows documentation or
users refer to Unicode, they mean UTF-16.
//Magnus