On Thu, Jan 22, 2015 at 12:34 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> This isn't really Windows-specific. The root of the problem is that
> when LC_COLLATE=C you were trying to use strxfrm() for the abbreviated
> key even though memcmp() is the authoritative comparator in that case.
> Exactly which platforms happened to blow up as a result of that is
> kind of beside the point.
I don't see how that could be a problem. Even if the strxfrm() blob
wasn't identical to the original string (I think it should always be,
accept maybe on MacOSX), it's still reasonable to assume that there is
equivalent behavior across the C locale and locales with collations
like en_US.UTF-8. It wasn't as if I was using strxfrm() within
bttextfastcmp_c() -- just within bttext_abbrev_convert(), to form an
abbreviated key for text that uses the C locale.
The C locale is just another locale - AFAICT, the only reason we have
treated it differently in PostgreSQL is because that tended to avoid
needing to copy and NUL-terminate, which strcoll() requires. It might
be that that doesn't work out for some reason (but not because
strxfrm() will not have behavior equivalent to memcpy() with the C
locale -- I was *not* relying on that).
That having been said, it's clearer to continue to handle each case (C
locale vs other locales) separately within the new
bttext_abbrev_convert() function, just to be consistent, but also to
avoid NUL-terminating the text strings to pass to strxfrm(), which as
you point out is an avoidable cost. So, I'm not asking you to restore
the terser uniform use of strxfrm() we had before, but, for what it's
worth, that was not the real issue. The real issue was that strxfrm()
spuriously used the wrong locale, as you said. Provided strxfrm() had
the correct locale (the C locale), then AFAICT it ought to have been
fine, regardless of whether or not it then behave exactly equivalently
to memcpy().
--
Peter Geoghegan