Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> If that does change the results, it indicates you've got strings which
>> are bytewise different but compare equal according to strcoll(). We've
>> seen this and other misbehaviors from some locale definitions when faced
>> with data that is invalid per the encoding the locale expects.
> There are plenty of non-bytewise-identical strings that do legitimately
> compare equal in various locales. Does the hash code hash strxfrm or the
> original bytes?
I think you are jumping to conclusions. I have not yet seen it
demonstrated that any locale definition in use in-the-wild intends to
compare nonidentical strings as equal. On the other hand, we have seen
plenty of cases of strcoll simply failing (delivering results that are
not even self-consistent) when faced with data it considers invalid.
I notice that the SUS permits strcoll to set errno if given invalid
data:
http://www.opengroup.org/onlinepubs/007908799/xsh/strcoll.html
We are not currently checking for that, but probably we should be.
regards, tom lane