Re: Rough draft for Unicode-aware UPPER()/LOWER()/INITCAP() - Mailing list pgsql-hackers

From Marko Karppinen
Subject Re: Rough draft for Unicode-aware UPPER()/LOWER()/INITCAP()
Date
Msg-id ADBE8746-A78D-11D8-9207-000A95C56374@karppinen.fi
Whole thread Raw
In response to Re: Rough draft for Unicode-aware UPPER()/LOWER()/INITCAP()  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
> Marko Karppinen wrote:
>> I think this interaction between the locale and server_encoding is
>> confusing. Is there any use case for running an incompatible mix?
>> If not, would it not make sense to fetch initdb's default database
>> encoding with nl_langinfo(CODESET) instead of using SQL_ASCII?

Peter Eisentraut wrote:
> This would be fine and dandy if we had any sort of idea about what sort
> of strings nl_langinfo(CODESET) returns and how to map them to our
> encoding names.

Karel Zak posted an answer to this last year, here on pgsql-hackers:
http://archives.postgresql.org/pgsql-hackers/2003-05/msg00744.php
It's not complete, but it's sort of an idea.

The code is under LGPL, but copyright doesn't reach down to the
actual information about the encoding strings used by various
operating systems, so it's possible to reappropriate. I'd imagine
that it covers many, if not most, of the likely cases.

The current situation of upper/lower/collating/etc just being
broken by default on many non-C locales is bad enough to warrant
bailing out during initdb when this situation is detected
(with a reasonably cautious heuristic).

It used to be that you got what you deserved if you were stupid
enough to define a non-C, non-ASCII-based locale. You had only
yourself to blame for everything breaking. These days, however,
millions of systems get shipped and installed with UTF-8 locales
on by default, so it's not possible to portray this as an user error.

Requiring every one of these people to configure initdb's encoding
manually would be harsh, however, so I think that an heuristic
that'd work with most modern systems would strike an appropriate
balance of correctness and path-of-least-surprise.

mk



pgsql-hackers by date:

Previous
From: Gaetano Mendola
Date:
Subject: Re: Email data type
Next
From: Bruce Momjian
Date:
Subject: Re: Call for 7.5 feature completion