Re: sortsupport for text - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: sortsupport for text
Date
Msg-id CAEYLb_VMKveD8kqYya9sA0reR70bfxTZWgBn3g86sRThmjpRUw@mail.gmail.com
Whole thread Raw
In response to Re: sortsupport for text  (Peter Geoghegan <peter@2ndquadrant.com>)
Responses Re: sortsupport for text
List pgsql-hackers
The fly in the ointment for strxfrm() adoption may be the need to be
consistent with this earlier behaviour:

commit 656beff59033ccc5261a615802e1a85da68e8fad
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Thu Dec 22 22:50:00 2005 +0000
   Adjust string comparison so that only bitwise-equal strings are considered   equal: if strcoll claims two strings
areequal, check it with strcmp, and   sort according to strcmp if not identical.  This fixes inconsistent   behavior
underglibc's hu_HU locale, and probably under some other locales   as well.  Also, take advantage of the
now-well-definedbehavior to speed up   texteq, textne, bpchareq, bpcharne: they may as well just do a bitwise
comparisonand not bother with strcoll at all. 
   NOTE: affected databases may need to REINDEX indexes on text columns to be   sure they are self-consistent.

Here is the relevant code:
    /*     * In some locales strcoll() can claim that nonidentical strings are     * equal.  Believing that would be
badnews for a number of reasons,     * so we follow Perl's lead and sort "equal" strings according to     * strcmp().
 */    if (result == 0)        result = strcmp(a1p, a2p); 

I'm not sure I agree with this decision; why should we presume to know
better than the glibc locale what constitutes equality? What are the
number of reasons referred to? It's seems very likely that the main
one was the then-need to guard against poor quality qsort()
implementations that went quadratic in the face of lots of duplicates,
but we already removed a bunch of other such hacks, because of course
we now control the qsort implementation used, and have since the year
after this commit was made, 2006.

Obviously this decision was made a number of years ago now, and at
least one person went on to rely on this behaviour, so it can only be
revisited with that in mind. However, provided we are able to say
"here is a compatibility ordering operator" to those that complain
about this, and provided it is appropriately listed as a compatibility
issue in the 9.3 release notes, I think it would be worth reverting
this commit to facilitate strxfrm().

How many people:

A) are using hu_HU or some other locale where this can happen?

and

B) will care?

Now, I'm sure that there is another workaround too, so this doesn't
need to be a blocker even if it is absolutely unacceptable to revert -
but I have to wonder if that's worth it. People don't have any
business relying on a sort order that is consistent in any way other
than the one they actually asked for. A few people still do even as we
go blue in the face telling them not to of course, but that's fairly
normal.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pgsql-hackers by date:

Previous
From: Misa Simic
Date:
Subject: Re: [PATCH] Support for foreign keys with arrays
Next
From: Tom Lane
Date:
Subject: Re: Broken system timekeeping breaks the stats collector