Re: Encoding, Unicode, locales, etc. - Mailing list pgsql-general

From Carlos Moreno
Subject Re: Encoding, Unicode, locales, etc.
Date
Msg-id 4548B419.1090205@mochima.com
Whole thread Raw
In response to Re: Encoding, Unicode, locales, etc.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Encoding, Unicode, locales, etc.
List pgsql-general
Thanks Tom, for your reply.

Tom Lane wrote:

>Carlos Moreno <moreno_pg@mochima.com> writes:
>
>
>>Why is it that the database
>>cluster is resrticted to a single locale (or single set of locales) instead
>>of being configurable on a per-database basis?
>>
>>
>
>Because we depend on libc's locale support, which (on many platforms)
>isn't designed to switch between locales cheaply  [...]
>
>This stuff is certainly far from ideal, but the amount of work involved
>to fix it is daunting; see many past pg-hackers discussions.
>
>

Fair enough --- and good to know.

>>2)  On the same token (more or less), I have a test database, for which
>>I ran initdb without specifying encoding or locale;  then, I create a
>>database with UTF8 encoding.
>>
>>
>
>There's no such thing as "you didn't specify a locale".  If you didn't
>specify one on the initdb command line, then it was taken from the
>environment.  Try "show lc_collate" and "show lc_ctype" to see what
>got used.
>
>

Yes, that's what I meant --- I meant that I did not use the --locale or
-E command-
line switches for the initdb command.  Both lc_ctype and lc_collate show
en_US.UTF-8

>>I try lower of a string that
>>contains characters with accents  (e.g., Spanish or French characters),
>>and it works as it should according to Spanish or French rules --- it
>>returns a string with the same characters in lowecase, with the same
>>accent.  Why did that work?  My Linux machine has all en_US.UTF-8
>>locales, and en_US is not even aware of characters with accents,
>>
>>
>
>You sure?  I'd sort of expect a UTF8 locale to know this stuff anyway.
>In any case, Postgres doesn't know anything about case conversion
>beyond what toupper/tolower tell it, so your experimental result is
>sufficient proof that that locale includes these conversions.
>
>

Are you sure there's nothing about the way PostgreSQL interacts with C
conversion functions?   I ask because, as part of a "sanity check", I
repeated
the tests --- now with two machines;  one that has PG 8.1.4, and the
other one
has 7.4.14, and they behave differently.

The one that does the case conversion "correctly" (read:  as I expect it
as per
Spanish or French rules) is 8.1.4 with en_US locale (LC_CTYPE and
LC_COLLATE both showing en_US.UTF-8).  PG 7.4.14, *even with
locale es_ES*, does not do the case conversion  (characters with accent
or tilde are left untouched).

I wonder if someone could shed some light on this little mystery....???
Perhaps to add more confusion to my experimental/informal tests, PG 8.1.4
is running on a FC4 AMD64 X2 box  (the command "locale" at the shell
prompt shows all en_US.utf8), and PG 7.4.14 is running on a laptop with
FC5 on an Intel Celeron M  (the command locale shows exactly the same
in that case).   Does this perhaps account for the difference?

Thanks,

Carlos
--


pgsql-general by date:

Previous
From: "Merlin Moncure"
Date:
Subject: Re: postgresql and reiserfs
Next
From: "Martin Kuria"
Date:
Subject: Grouping My query