Home > mailing lists

Re: Locale + encoding combinations - Mailing list pgsql-hackers

From	Magnus Hagander
Subject	Re: Locale + encoding combinations
Date	October 12, 2007 11:45:43
Msg-id	20071012144510.GH6334@svr2.hagander.net Whole thread Raw
In response to	Re: Locale + encoding combinations ("Trevor Talbot" <quension@gmail.com>)
List	pgsql-hackers

Tree view

On Fri, Oct 12, 2007 at 06:03:52AM -0700, Trevor Talbot wrote:
> On 10/12/07, Dave Page <dpage@postgresql.org> wrote:
> > Tom Lane wrote
> > > That still leaves us with the problem of how to tell whether a locale
> > > spec is bad on Windows.  Judging by your example, Windows checks whether
> > > the code page is present but not whether it is sane for the base locale.
> > > What happens when there's a mismatch --- eg, what encoding do system
> > > messages come out in?
> >
> > I'm not sure how to test that specifically, but it seems that accented
> > characters simply fall back to their undecorated equivalents if the
> > encoding is not appropriate, eg:
> >
> > Dave@SNAKE:~$ ./setlc French_France.1252
> > Locale: French_France.1252
> > The date is: sam. 01 of août  2007
> > Dave@SNAKE:~$ ./setlc French_France.28597
> > Locale: French_France.28597
> > The date is: sam. 01 of aout  2007
> >
> > (the encodings used there are WIN1252 and ISO8859-7 (Greek)).
> >
> > I'm happy to test further is you can suggest how I can figure out the
> > encoding actually output.
> 
> The encoding output is the one you specified.  Keep in mind,
> underneath Windows is mostly working with Unicode, so all characters
> exist and the locale rules specify their behavior there.  The encoding
> is just the byte stream it needs to force them all into after doing
> whatever it does to them.  As you've seen, it uses some sort of
> best-fit mapping I don't know the details of.  (It will drop accent
> marks and choose characters with similar shape where possible, by
> default.)
> 
> I think it's a bit more complex for input/transform cases where you
> operate on the byte stream directly without intermediate conversion to
> Unicode, which is why UTF-8 doesn't work as a codepage, but again I
> don't have the details nearby.  I can try to do more digging if
> needed.

Just so the non-windows-savvy people get it.. When Windows documentation or
users refer to Unicode, they mean UTF-16.

//Magnus

pgsql-hackers by date:

From: Magnus Hagander
Date: 12 October 2007, 11:40:52
Subject: Re: pg_tablespace_size()

From: Tom Lane
Date: 12 October 2007, 11:46:13
Subject: Re: Including Snapshot Info with Indexes

Re: Locale + encoding combinations - Mailing list pgsql-hackers

Previous

Next