Home > mailing lists

Re: Encoding, Unicode, locales, etc. - Mailing list pgsql-general

From	Carlos Moreno
Subject	Re: Encoding, Unicode, locales, etc.
Date	November 1, 2006 14:09:11
Msg-id	4548B419.1090205@mochima.com Whole thread Raw
In response to	Re: Encoding, Unicode, locales, etc. (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Encoding, Unicode, locales, etc.
List	pgsql-general

Tree view

Thanks Tom, for your reply.

Tom Lane wrote:

>Carlos Moreno <moreno_pg@mochima.com> writes:
>
>
>>Why is it that the database
>>cluster is resrticted to a single locale (or single set of locales) instead
>>of being configurable on a per-database basis?
>>
>>
>
>Because we depend on libc's locale support, which (on many platforms)
>isn't designed to switch between locales cheaply  [...]
>
>This stuff is certainly far from ideal, but the amount of work involved
>to fix it is daunting; see many past pg-hackers discussions.
>
>

Fair enough --- and good to know.

>>2)  On the same token (more or less), I have a test database, for which
>>I ran initdb without specifying encoding or locale;  then, I create a
>>database with UTF8 encoding.
>>
>>
>
>There's no such thing as "you didn't specify a locale".  If you didn't
>specify one on the initdb command line, then it was taken from the
>environment.  Try "show lc_collate" and "show lc_ctype" to see what
>got used.
>
>

Yes, that's what I meant --- I meant that I did not use the --locale or
-E command-
line switches for the initdb command.  Both lc_ctype and lc_collate show
en_US.UTF-8

>>I try lower of a string that
>>contains characters with accents  (e.g., Spanish or French characters),
>>and it works as it should according to Spanish or French rules --- it
>>returns a string with the same characters in lowecase, with the same
>>accent.  Why did that work?  My Linux machine has all en_US.UTF-8
>>locales, and en_US is not even aware of characters with accents,
>>
>>
>
>You sure?  I'd sort of expect a UTF8 locale to know this stuff anyway.
>In any case, Postgres doesn't know anything about case conversion
>beyond what toupper/tolower tell it, so your experimental result is
>sufficient proof that that locale includes these conversions.
>
>

Are you sure there's nothing about the way PostgreSQL interacts with C
conversion functions?   I ask because, as part of a "sanity check", I
repeated
the tests --- now with two machines;  one that has PG 8.1.4, and the
other one
has 7.4.14, and they behave differently.

The one that does the case conversion "correctly" (read:  as I expect it
as per
Spanish or French rules) is 8.1.4 with en_US locale (LC_CTYPE and
LC_COLLATE both showing en_US.UTF-8).  PG 7.4.14, *even with
locale es_ES*, does not do the case conversion  (characters with accent
or tilde are left untouched).

I wonder if someone could shed some light on this little mystery....???
Perhaps to add more confusion to my experimental/informal tests, PG 8.1.4
is running on a FC4 AMD64 X2 box  (the command "locale" at the shell
prompt shows all en_US.utf8), and PG 7.4.14 is running on a laptop with
FC5 on an Intel Celeron M  (the command locale shows exactly the same
in that case).   Does this perhaps account for the difference?

Thanks,

Carlos
--

pgsql-general by date:

From: "Merlin Moncure"
Date: 01 November 2006, 13:44:46
Subject: Re: postgresql and reiserfs

From: "Martin Kuria"
Date: 01 November 2006, 14:23:53
Subject: Grouping My query

Re: Encoding, Unicode, locales, etc. - Mailing list pgsql-general

Previous

Next