Re: Encoding, Unicode, locales, etc. - Mailing list pgsql-general
From | Carlos Moreno |
---|---|
Subject | Re: Encoding, Unicode, locales, etc. |
Date | |
Msg-id | 4548B419.1090205@mochima.com Whole thread Raw |
In response to | Re: Encoding, Unicode, locales, etc. (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Encoding, Unicode, locales, etc.
|
List | pgsql-general |
Thanks Tom, for your reply. Tom Lane wrote: >Carlos Moreno <moreno_pg@mochima.com> writes: > > >>Why is it that the database >>cluster is resrticted to a single locale (or single set of locales) instead >>of being configurable on a per-database basis? >> >> > >Because we depend on libc's locale support, which (on many platforms) >isn't designed to switch between locales cheaply [...] > >This stuff is certainly far from ideal, but the amount of work involved >to fix it is daunting; see many past pg-hackers discussions. > > Fair enough --- and good to know. >>2) On the same token (more or less), I have a test database, for which >>I ran initdb without specifying encoding or locale; then, I create a >>database with UTF8 encoding. >> >> > >There's no such thing as "you didn't specify a locale". If you didn't >specify one on the initdb command line, then it was taken from the >environment. Try "show lc_collate" and "show lc_ctype" to see what >got used. > > Yes, that's what I meant --- I meant that I did not use the --locale or -E command- line switches for the initdb command. Both lc_ctype and lc_collate show en_US.UTF-8 >>I try lower of a string that >>contains characters with accents (e.g., Spanish or French characters), >>and it works as it should according to Spanish or French rules --- it >>returns a string with the same characters in lowecase, with the same >>accent. Why did that work? My Linux machine has all en_US.UTF-8 >>locales, and en_US is not even aware of characters with accents, >> >> > >You sure? I'd sort of expect a UTF8 locale to know this stuff anyway. >In any case, Postgres doesn't know anything about case conversion >beyond what toupper/tolower tell it, so your experimental result is >sufficient proof that that locale includes these conversions. > > Are you sure there's nothing about the way PostgreSQL interacts with C conversion functions? I ask because, as part of a "sanity check", I repeated the tests --- now with two machines; one that has PG 8.1.4, and the other one has 7.4.14, and they behave differently. The one that does the case conversion "correctly" (read: as I expect it as per Spanish or French rules) is 8.1.4 with en_US locale (LC_CTYPE and LC_COLLATE both showing en_US.UTF-8). PG 7.4.14, *even with locale es_ES*, does not do the case conversion (characters with accent or tilde are left untouched). I wonder if someone could shed some light on this little mystery....??? Perhaps to add more confusion to my experimental/informal tests, PG 8.1.4 is running on a FC4 AMD64 X2 box (the command "locale" at the shell prompt shows all en_US.utf8), and PG 7.4.14 is running on a laptop with FC5 on an Intel Celeron M (the command locale shows exactly the same in that case). Does this perhaps account for the difference? Thanks, Carlos --
pgsql-general by date: