Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)
Date
Msg-id CAM3SWZRpzdphRojzOQpjq1cm4nk-5Kf9P0W5T1rzJFih=2AOig@mail.gmail.com
Whole thread Raw
In response to Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)  (Palle Girgensohn <girgen@pingpong.net>)
Responses Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)  (Palle Girgensohn <girgen@pingpong.net>)
List pgsql-hackers
On Wed, Aug 10, 2016 at 1:42 PM, Palle Girgensohn <girgen@pingpong.net> wrote:
> They've been used for the FreeBSD ports since 2005, and have served us well. I have of course updated them regularly.
Inthis latest version, I've removed support for other encodings beside UTF-8, mostly since I don't know how to test
them,but also, I see little point in supporting anything else using ICU. 

Looks like you're not using the ICU equivalent of strxfrm(). While 9.5
is not the release that introduced its use, it did expand it
significantly. I think you need to fix this, even though it isn't
actually used to sort text at present, since presumably FreeBSD builds
of 9.5 don't TRUST_STRXFRM. Since you're using ICU, though, you could
reasonably trust the ICU equivalent of strxfrm(), so that's a missed
opportunity. (See the wiki page on the abbreviated keys issue [1] if
you don't know what I'm talking about.)

Shouldn't you really have a strxfrm() wrapper, used across the board,
including for callers outside of varlena.c? convert_string_datum() has
been calling strxfrm() for many releases now. These calls are still
used in FreeBSD builds, I would think, which seems like a bug that is
not dodged by simply not defining TRUST_STRXFRM. Isn't its assumption
that that matching the ordering used elsewhere not really hold on
FreeBSD builds?

[1] https://wiki.postgresql.org/wiki/Abbreviated_keys_glibc_issue
--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Kisung Kim
Date:
Subject: Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
Next
From: Peter Eisentraut
Date:
Subject: Re: Set log_line_prefix and application name in test drivers