Re: Rough draft for Unicode-aware UPPER()/LOWER()/INITCAP() - Mailing list pgsql-hackers

From Marko Karppinen
Subject Re: Rough draft for Unicode-aware UPPER()/LOWER()/INITCAP()
Date
Msg-id B5E5BD42-A76A-11D8-9207-000A95C56374@karppinen.fi
Whole thread Raw
In response to Rough draft for Unicode-aware UPPER()/LOWER()/INITCAP()  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Rough draft for Unicode-aware UPPER()/LOWER()/INITCAP()  (Peter Eisentraut <peter_e@gmx.net>)
Re: Rough draft for Unicode-aware  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Re: Rough draft for Unicode-aware UPPER()/LOWER()/INITCAP()  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> This code will only work if the database is running under an LC_CTYPE
> setting that implies the same encoding specified by server_encoding.
> However, I don't see that as a fatal objection, because in point of 
> fact
> the existing upper/lower code assumes the same thing.

I think this interaction between the locale and server_encoding is
confusing. Is there any use case for running an incompatible mix?
If not, would it not make sense to fetch initdb's default database
encoding with nl_langinfo(CODESET) instead of using SQL_ASCII?

initdb could even emit a warning if the --encoding option was
used without also specifying --no-locale.

Using nl_langinfo(CODESET) was discussed and quietly dismissed a
year ago (although the topic was the client encoding back then).
But I think that the idea is worth revisiting because it would
allow UPPER() and LOWER() to work correctly with international
alphabets -- out of the box and without configuration -- on a
wide variety of modern systems.

mk



pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: Call for 7.5 feature completion
Next
From: Bruce Momjian
Date:
Subject: Re: Call for 7.5 feature completion