Home > mailing lists

Re: multibyte-character aware support for function "downcase_truncate_identifier()" - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: multibyte-character aware support for function "downcase_truncate_identifier()"
Date	November 21, 2010 19:48:12
Msg-id	29502.1290383281@sss.pgh.pa.us Whole thread
In response to	Re: multibyte-character aware support for function "downcase_truncate_identifier()" (Andrew Dunstan <andrew@dunslane.net>)
Responses	Re: multibyte-character aware support for function "downcase_truncate_identifier()" Re: multibyte-character aware support for function "downcase_truncate_identifier()"
List	pgsql-hackers

Tree view

Andrew Dunstan <andrew@dunslane.net> writes:
> On 11/21/2010 06:09 PM, Robert Haas wrote:
>> I think that's fair.  It actually doesn't seem like it should be that
>> hard if we knew that the server encoding were UTF8 - it's just a big
>> translation table somewhere, no?

> No, it's far more complex. See for example 
> <http://unicode.org/reports/tr21/tr21-3.html>, which says:

Yeah.  I'm actually not sure that the SQL committee has thought very
hard about this, because the spec is worded as though they think that
"Unicode case normalization" is all they have to say to uniquely define
what to do.  The Unicode guys recognize that case mapping is
locale-specific, which puts us right back at square one.  But leaving
spec compliance aside, we know from bitter experience that we cannot use
a definition that lets the Turkish locale fool with the mapping of i/I.
I suspect that locale-dependent mappings of any other characters are
just as bad, we simply haven't had enough users burnt by such cases to
have an institutional memory of it.  But for example do you really think
it's a good idea if pg_dump and reload into a DB with a different locale
results in changing the normalized form of SQL identifiers?
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 21 November 2010, 19:38:14
Subject: Re: knngist - 0.8

From: Robert Haas
Date: 21 November 2010, 20:05:20
Subject: Re: Spread checkpoint sync

Re: multibyte-character aware support for function "downcase_truncate_identifier()" - Mailing list pgsql-hackers

Previous

Next