Re: multibyte-character aware support for function "downcase_truncate_identifier()" - Mailing list pgsql-hackers

From Tom Lane
Subject Re: multibyte-character aware support for function "downcase_truncate_identifier()"
Date
Msg-id 26799.1290375695@sss.pgh.pa.us
Whole thread Raw
In response to Re: multibyte-character aware support for function "downcase_truncate_identifier()"  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: multibyte-character aware support for function "downcase_truncate_identifier()"
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Jul 7, 2010 at 10:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> IIRC this is intentional. �Please consult the archives for previous
>> discussions.

> Why would this be intentional?

Well, it's intentional for lack of any infrastructure that would allow
a more spec-compliant approach.  As you say, calling str_tolower here
is probably a non-starter for performance reasons.  Another big problem
is that str_tolower produces a locale-specific downcasing conversion.
This (a) is going to create portability headaches of the first magnitude,
and (b) is not really an advance in terms of spec compliance.  The SQL
spec says that identifier case folding should be done according to the
Unicode standard, but it's not safe to assume that any random
platform-specific locale is going to act that way.  A specific example
of a locale that is known to NOT behave acceptably is Turkish: they have
weird ideas about i versus I, which in fact broke things back when we
used to use tolower for this purpose.  See the archives from early 2004,
and in particular commit 59f9a0b9df0d224bb62ff8ec5b65e0b187655742, which
removed the exact same logic (though not wide-character-aware) that this
patch proposes to put back.

I think the given patch can be rejected out of hand.  If the OP has any
ideas about doing non-locale-dependent case folding at an acceptable
speed, I'm happy to listen.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: multibyte-character aware support for function "downcase_truncate_identifier()"
Next
From: Dimitri Fontaine
Date:
Subject: Re: ALTER OBJECT any_name SET SCHEMA name