Home > mailing lists

Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding - Mailing list pgsql-hackers

From	Andrew Dunstan
Subject	Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding
Date	June 9, 2013 13:51:45
Msg-id	51B4884A.4080206@dunslane.net Whole thread Raw
In response to	Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding (Noah Misch <noah@leadboat.com>)
Responses	Re: Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding
List	pgsql-hackers

Tree view

On 06/09/2013 12:38 AM, Noah Misch wrote:
> On Sat, Jun 08, 2013 at 11:50:53PM -0400, Andrew Dunstan wrote:
>> On 06/08/2013 10:52 PM, Noah Misch wrote:
>>> Let's return to the drawing board on this one.  I would be inclined to keep
>>> the current bad behavior until we implement the i18n-aware case folding
>>> required by SQL.  If I'm alone in thinking that, perhaps switch to downcasing
>>> only ASCII characters regardless of the encoding.  That at least gives
>>> consistent application behavior.
>>>
>>> I apologize for not noticing to comment on this week's thread.
>>>
>> The behaviour which this fixes is an unambiguous bug. Calling tolower()
>> on the individual bytes of a multi-byte character can't possibly produce
>> any sort of correct result. A database that contains such corrupted
>> names, probably not valid in any encoding at all, is almost certainly
>> not restorable, and I'm not sure if it's dumpable either.
> I agree with each of those points.  However, since any change here breaks
> compatibility, we should fix it right the first time.  A second compatibility
> break would be all the more onerous once this intermediate step helps more
> users to start using unquoted, non-ASCII object names.
>
>> It's already
>> produced several complaints in recent months, so ISTM that returning to
>> it for any period of time is unthinkable.
> PostgreSQL has lived with this wrong behavior since ... the beginning?  It's a
> problem, certainly, but a bandage fix brings its own trouble.


If you have a better fix I am all ears. I can recall at least one 
discussion of this area (concerning Turkish I quite a few years ago) 
where we failed to come up with anything.

I have a fairly hard time believing in your "relies on this and somehow 
works" scenario.

cheers

andrew

pgsql-hackers by date:

From: Andres Freund
Date: 09 June 2013, 13:25:59
Subject: Re: [PATCH] add --throttle to pgbench (submission 3)

From: Kevin Grittner
Date: 09 June 2013, 13:56:17
Subject: Re: Batch API for After Triggers

Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding - Mailing list pgsql-hackers

Previous

Next