Re: sortsupport for text - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: sortsupport for text |
Date | |
Msg-id | CAEYLb_VMKveD8kqYya9sA0reR70bfxTZWgBn3g86sRThmjpRUw@mail.gmail.com Whole thread Raw |
In response to | Re: sortsupport for text (Peter Geoghegan <peter@2ndquadrant.com>) |
Responses |
Re: sortsupport for text
|
List | pgsql-hackers |
The fly in the ointment for strxfrm() adoption may be the need to be consistent with this earlier behaviour: commit 656beff59033ccc5261a615802e1a85da68e8fad Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Thu Dec 22 22:50:00 2005 +0000 Adjust string comparison so that only bitwise-equal strings are considered equal: if strcoll claims two strings areequal, check it with strcmp, and sort according to strcmp if not identical. This fixes inconsistent behavior underglibc's hu_HU locale, and probably under some other locales as well. Also, take advantage of the now-well-definedbehavior to speed up texteq, textne, bpchareq, bpcharne: they may as well just do a bitwise comparisonand not bother with strcoll at all. NOTE: affected databases may need to REINDEX indexes on text columns to be sure they are self-consistent. Here is the relevant code: /* * In some locales strcoll() can claim that nonidentical strings are * equal. Believing that would be badnews for a number of reasons, * so we follow Perl's lead and sort "equal" strings according to * strcmp(). */ if (result == 0) result = strcmp(a1p, a2p); I'm not sure I agree with this decision; why should we presume to know better than the glibc locale what constitutes equality? What are the number of reasons referred to? It's seems very likely that the main one was the then-need to guard against poor quality qsort() implementations that went quadratic in the face of lots of duplicates, but we already removed a bunch of other such hacks, because of course we now control the qsort implementation used, and have since the year after this commit was made, 2006. Obviously this decision was made a number of years ago now, and at least one person went on to rely on this behaviour, so it can only be revisited with that in mind. However, provided we are able to say "here is a compatibility ordering operator" to those that complain about this, and provided it is appropriately listed as a compatibility issue in the 9.3 release notes, I think it would be worth reverting this commit to facilitate strxfrm(). How many people: A) are using hu_HU or some other locale where this can happen? and B) will care? Now, I'm sure that there is another workaround too, so this doesn't need to be a blocker even if it is absolutely unacceptable to revert - but I have to wonder if that's worth it. People don't have any business relying on a sort order that is consistent in any way other than the one they actually asked for. A few people still do even as we go blue in the face telling them not to of course, but that's fairly normal. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
pgsql-hackers by date: