Re: sortsupport for text - Mailing list pgsql-hackers

From Greg Stark
Subject Re: sortsupport for text
Date
Msg-id CAM-w4HPF29Q=z8uE18Avtkckr5xnoLeAPjE1Tfn7p+Q0_EgbdQ@mail.gmail.com
Whole thread Raw
In response to Re: sortsupport for text  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: sortsupport for text
List pgsql-hackers
On Sun, Jun 17, 2012 at 9:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The trick for hashing such datatypes is to be able to guarantee that
> "equal" values hash to the same hash code, which is typically possible
> as long as you know the equality rules well enough.  We could possibly
> do that for text with pure-strcoll equality if we knew all the details
> of what strcoll would consider "equal", but we do not.

It occurs to me that strxfrm would answer this question. If we made
the hash function hash the result of strxfrm then we could make
equality use strcoll and not fall back to strcmp.

I'm suspect in a green field that's what we would do though the cpu
cost might be enough to think hard about it. I'm not sure it's worth
considering switching though.

The cases where it matters to users incidentally is when you have a
multi-column sort order and have values that are supposed to sort
equal in the first column but print differently. Given that there
seems to be some controversy in the locale definitions -- most locals
seem to use "insignificant" factors like accents or ligatures as
tie-breakers and avoid claiming different sequences are equal even
when the language usually treats them as equivalent -- it doesn't seem
super important to maintain the property for the few locales that fall
the other way. Unless my impression is wrong and there's a good
principled reason why some locales treat nearly equivalent strings one
way and some treat them the other.

--
greg


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node
Next
From: Peter Geoghegan
Date:
Subject: Re: sortsupport for text