Thread: using Core Foundation locale functions

using Core Foundation locale functions

From

Peter Eisentraut

Date:

28 November 2014, 19:43:36

In light of the recent discussions about using ICU on OS X, I looked
into the Core Foundation locale functions (Core Foundation = traditional
Mac API in OS X, as opposed to the Unix/POSIX APIs).

Attached is a proof of concept patch that just about works for the
sorting aspects.  (The ctype aspects aren't there yet and will crash,
but they could be done similarly.)  It passes an appropriately adjusted
collate.linux.utf8 test, meaning that it does produce language-aware
sort orders that are equivalent to what glibc produces.

At the moment, this is probably just an experiment that shows where
refactoring and better abstractions might be suitable if we want to
support multiple locale libraries.  If we want to pursue ICU, I think
this could be a useful third option.

Attachment

cf-locale.patch

Re: using Core Foundation locale functions

From

"David E. Wheeler"

Date:

01 December 2014, 19:53:08

On Nov 28, 2014, at 8:43 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>
> At the moment, this is probably just an experiment that shows where
> refactoring and better abstractions might be suitable if we want to
> support multiple locale libraries.  If we want to pursue ICU, I think
> this could be a useful third option.

Gotta say, I’m thrilled to see movement on this front, and especially pleased to see how consensus seems to be building
aroundan abstracted interface to keep options open. This platform-specific example really highlights the need for it (I
hadno idea that there was separate and more up-to-date collation support in Core Foundation than in the UNIX layer of
OSX). 

Really looking forward to seeing where we end up.

Best,

David

Re: using Core Foundation locale functions

From

Peter Geoghegan

Date:

02 December 2014, 11:18:08

On Fri, Nov 28, 2014 at 8:43 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> At the moment, this is probably just an experiment that shows where
> refactoring and better abstractions might be suitable if we want to
> support multiple locale libraries.  If we want to pursue ICU, I think
> this could be a useful third option.

FWIW, I think that the richer API that ICU provides for string
transformations could be handy in optimizing sorting using abbreviated
keys. For example, ICU will happily only produce parts of sort keys
(the equivalent of strxfrm() blobs) if that is all that is required
[1].

I think that ICU also allows clients to parse individual primary
weights in a principled way (primary weights tend to be isomorphic to
the Unicode code points in the original string). I think that this
will enable order-preserving compression of the type anticipated by
the Unicode collation algorithm [2]. That could be useful for certain
languages, like Russian, where the primary weight level usually
contains multi-byte code points with glibc's strxfrm() (this is
generally not true of languages that use the Latin alphabet, or of
East Asian languages).

Note that there is already naturally a form of what you might call
compression with strxfrm() [3]. This is very useful for abbreviated
keys.

[1] http://userguide.icu-project.org/collation/architecture
[2] http://www.unicode.org/reports/tr10/#Run-length_Compression
[3] http://www.postgresql.org/message-id/CAM3SWZTyWe5J69TaPvZf2CM7mhSKKE3UhHnK9gLuQckkWqoL5w@mail.gmail.com
-- 
Peter Geoghegan

Re: using Core Foundation locale functions

From

Noah Misch

Date:

03 December 2014, 08:52:12

On Fri, Nov 28, 2014 at 11:43:28AM -0500, Peter Eisentraut wrote:
> In light of the recent discussions about using ICU on OS X, I looked
> into the Core Foundation locale functions (Core Foundation = traditional
> Mac API in OS X, as opposed to the Unix/POSIX APIs).
> 
> Attached is a proof of concept patch that just about works for the
> sorting aspects.  (The ctype aspects aren't there yet and will crash,
> but they could be done similarly.)  It passes an appropriately adjusted
> collate.linux.utf8 test, meaning that it does produce language-aware
> sort orders that are equivalent to what glibc produces.
> 
> At the moment, this is probably just an experiment that shows where
> refactoring and better abstractions might be suitable if we want to
> support multiple locale libraries.  If we want to pursue ICU, I think
> this could be a useful third option.

Does this make the backend multi-threaded?

Re: using Core Foundation locale functions

From

Craig Ringer

Date:

03 December 2014, 09:07:27

On 12/02/2014 12:52 AM, David E. Wheeler wrote:
> Gotta say, I’m thrilled to see movement on this front, and especially pleased to see how consensus seems to be
buildingaround an abstracted interface to keep options open. This platform-specific example really highlights the need
forit (I had no idea that there was separate and more up-to-date collation support in Core Foundation than in the UNIX
layerof OS X).
 

It'd also potentially let us make use of Windows' native locale APIs,
which AFAIK receive considerably more love on that platform than their
POSIX back-country cousins.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: using Core Foundation locale functions

From

Peter Geoghegan

Date:

03 December 2014, 09:13:17

On Tue, Dec 2, 2014 at 10:07 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 12/02/2014 12:52 AM, David E. Wheeler wrote:
>> Gotta say, I’m thrilled to see movement on this front, and especially pleased to see how consensus seems to be
buildingaround an abstracted interface to keep options open. This platform-specific example really highlights the need
forit (I had no idea that there was separate and more up-to-date collation support in Core Foundation than in the UNIX
layerof OS X). 
>
> It'd also potentially let us make use of Windows' native locale APIs,
> which AFAIK receive considerably more love on that platform than their
> POSIX back-country cousins.

Not to mention the fact that a MultiByteToWideChar() call could be
saved, and sortsupport for text would just work on Windows.

--
Peter Geoghegan