Re: is this a bug or I am blind? - Mailing list pgsql-general

From Lincoln Yeoh
Subject Re: is this a bug or I am blind?
Date
Msg-id 5.2.1.1.1.20051217104647.02d1eef0@localhost
Whole thread Raw
In response to Re: is this a bug or I am blind?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: is this a bug or I am blind?
Re: is this a bug or I am blind?
List pgsql-general
At 01:40 PM 12/16/2005 -0500, Tom Lane wrote:

>Nobody's said anything about giving up locale-sensitive sorting.  The
>question is about locale-sensitive equality: does it really make sense
>that 'tty' = 'tyty'?  Would your answer change in the context
>'/dev/tty' = '/dev/tyty'?  Are you willing to *not have access* to a
>text comparison operator that will make the distinction?
>
>I'm inclined to think that this is more like the occasional need for
>accent-insensitive comparisons.  It seems generally agreed that you want
>something like smash('ab') = smash('áb') rather than making the
>strings equal in all contexts.

I agree.

I would prefer for everything to be compared without any
collation/corruption by default, and for there to be a function to pick the
desired comparison behaviour ( Can all that functionality be done with the
collate clause?).

Because most databases are multi-locale whether the humans are aware of it
or not:

The Computer "locale", human locale #1, unknown/international locale, human
locale #2, ...

In a column for license keys, "tty" should rarely be the same as "tyty".
In a column for base64 data (crypto hashes, etc) "tty" should NEVER be the
same as "tyty".
In a column for domain names, I doubt it is clear whether you want to match
tty.ibm.hu just because tyty.ibm.hu exists.

But in a column for license owner names, one might want "tty" and "tyty" to
be the same - one might have to have a multicolumn index depending on the
owner's locale of choice.

I recommend that for these reasons initdb should always pick "no mangled"
text by default, no matter what the locale setting is. And that users
should be advised of the potential consequences of mangling or I would even
say corrupting all text in their databases by default.

Regards,
Link.


pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: client_encoding values
Next
From: Karsten Hilbert
Date:
Subject: Re: is this a bug or I am blind?