Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00) - Mailing list pgsql-hackers

From Palle Girgensohn
Subject Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)
Date
Msg-id 7D9265F9-2E7B-4136-BCA1-AAE8481561B7@pingpong.net
Whole thread Raw
In response to Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)  (Palle Girgensohn <girgen@pingpong.net>)
List pgsql-hackers
> 11 aug. 2016 kl. 03:05 skrev Peter Geoghegan <pg@heroku.com>:
>
> On Wed, Aug 10, 2016 at 1:42 PM, Palle Girgensohn <girgen@pingpong.net> wrote:
>> They've been used for the FreeBSD ports since 2005, and have served us well. I have of course updated them
regularly.In this latest version, I've removed support for other encodings beside UTF-8, mostly since I don't know how
totest them, but also, I see little point in supporting anything else using ICU. 
>
> Looks like you're not using the ICU equivalent of strxfrm(). While 9.5
> is not the release that introduced its use, it did expand it
> significantly. I think you need to fix this, even though it isn't
> actually used to sort text at present, since presumably FreeBSD builds
> of 9.5 don't TRUST_STRXFRM. Since you're using ICU, though, you could
> reasonably trust the ICU equivalent of strxfrm(), so that's a missed
> opportunity. (See the wiki page on the abbreviated keys issue [1] if
> you don't know what I'm talking about.)

My plan was to get it working without TRUST_STRXFRM first, and then add that functinality. I've made some preliminary
testsusing ICU:s ucol_getSortKey but I will have to test it a bit more. For now, I just expect not to trust strxfrm. It
isthe first iteration wrt strxfrm, the plan is to use that code base. 

>
> Shouldn't you really have a strxfrm() wrapper, used across the board,
> including for callers outside of varlena.c? convert_string_datum() has
> been calling strxfrm() for many releases now. These calls are still
> used in FreeBSD builds, I would think, which seems like a bug that is
> not dodged by simply not defining TRUST_STRXFRM. Isn't its assumption
> that that matching the ordering used elsewhere not really hold on
> FreeBSD builds?

I was not aware of convert_string_datum, I will check that, thanks! Using a wrapper across the board seems like a good
ideafor refactoring. 

>
> [1] https://wiki.postgresql.org/wiki/Abbreviated_keys_glibc_issue
> --
> Peter Geoghegan


pgsql-hackers by date:

Previous
From: Marko Tiikkaja
Date:
Subject: Re: Assertion failure in REL9_5_STABLE
Next
From: Artur Zakirov
Date:
Subject: Re: Bug in to_timestamp().