Re: is this a bug or I am blind? - Mailing list pgsql-general

From Martijn van Oosterhout
Subject Re: is this a bug or I am blind?
Date
Msg-id 20051216175411.GA11985@svana.org
Whole thread Raw
In response to Re: is this a bug or I am blind?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: is this a bug or I am blind?
Re: is this a bug or I am blind?
List pgsql-general
On Fri, Dec 16, 2005 at 12:12:08PM -0500, Tom Lane wrote:
> Perhaps the fast-path check is a bad idea, but fixing this is not just
> a matter of removing that.  If we subscribe to strcoll's worldview then
> we have to conclude that *text strings are not hashable*, because
> strings that should be "equal" may have different hash codes.  And at
> least in the current PG code, that's not something we can flip on and off
> depending on the locale --- texteq would have to be marked non hashable
> in the system catalogs, meaning a big performance hit for *everybody*
> even if their locale is not this weird.

That's true, in the sense that unconverted strings are not hashable.
This is what strxfrm was created for, to return the sorting key for a
string. A quick C program demonstrates that indeed in that locale these
two strings are equal, whereas in en_AU they are not.

$ LC_ALL=hu_HU ./strxfrm potyty potty
String  1: potyty
Strxfrm 1: " ((\x01\x02\x02\x02\x02\x01\x02\x02\x02\x02
String  2: potty
Strxfrm 2: " ((\x01\x02\x02\x02\x02\x01\x02\x02\x02\x02
$ LC_ALL=en_AU ./strxfrm potyty potty
String  1: potyty
Strxfrm 1: \x1B\x1A\x1F$\x1F$\x01\x02\x02\x02\x02\x02\x02\x01\x02\x02\x02\x02\x02\x02
String  2: potty
Strxfrm 2: \x1B\x1A\x1F\x1F$\x01\x02\x02\x02\x02\x02\x01\x02\x02\x02\x02\x02

I think the only way to make indexes properly locale sensetive would be
to either use strcoll() in all cases, or store the result from
strxfrm() in the index. Anything else will break somewhere.

In any case, we first need to determine which answer is correct, before
we run off trying to fix it.

This is Glibc 2.3.2 on a Debian Linux system.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: is this a bug or I am blind?
Next
From: Csaba Nagy
Date:
Subject: Re: is this a bug or I am blind?