Re: [pgsql-packagers] Palle Girgensohn's ICU patch - Mailing list pgsql-hackers
From | Palle Girgensohn |
---|---|
Subject | Re: [pgsql-packagers] Palle Girgensohn's ICU patch |
Date | |
Msg-id | 0ECFF0FA-2D9C-46D4-BEF8-34C7A5215FDC@pingpong.net Whole thread Raw |
In response to | Re: [pgsql-packagers] Palle Girgensohn's ICU patch (Dave Page <dpage@pgadmin.org>) |
Responses |
Re: [pgsql-packagers] Palle Girgensohn's ICU patch
|
List | pgsql-hackers |
> 27 nov 2014 kl. 10:15 skrev Dave Page <dpage@pgadmin.org>: > > > > On Thu, Nov 27, 2014 at 9:09 AM, Jakob Egger <jakob@eggerapps.at> wrote: > Am 26.11.2014 um 17:46 schrieb Geoff Montee <geoff.montee@gmail.com>: > > This topic reminds me of a thread from a couple months ago: > > > > http://www.postgresql.org/message-id/F8268DB6-B50F-429F-8289-DA8FFA5F22BA@tripadvisor.com > > > > It sounds like adding ICU support to core may also allow for adding > > collation versioning to indexes. > > Reading through this thread it becomes clear to me that adding support for ICU is more important than I thought, and theonly problem is that no one has yet volunteered for it :) > > I've started looking through the PostgreSQL source and Palle's patch to estimate what needs to be done. > > MINIMUM TODO > ============ > > * Add support for per-column collations in varstr_comp() in varlena.c. Currently the patch creates a single ICU collatorfor the default collation and stores it in a static variable. We would need to change this to create collators foreach collation and store them in a hash table similar to pg_newlocale_from_collation() / lookup_collation_cache() > > * There's a new feature in trunk for faster sorting using SortSupport, so we would also need to also patch bttextfastcmp_locale()in varlena.c > > These two changes would allow using ICU for collation. This has two major advantages: > 1) Systems with broken strcoll like OS X and FreeBSD can take advantage of ICU to offer proper text sorting > 2) You can link with a specific version of ICU to avoid index corruption and duplicate keys caused by changing implementationsof the glibc strcoll function > > > NEXT STEPS: Support for more collations > ======================================= > > ICU offers a lot more collations than the OS. For example, besides "de_CH" it also offers "de_CH@collation=phonebook".Adding support for these is a bit more involved. > > * initdb would need to be extended to also look for collations offered by ICU and add them to the pg_collation catalog. > > * A special case for LC_COLLATE must be added to check_locale() in the backend, get_canonical_locale_name() in pg_upgrade,check_locale_name() in initdb to support collations provided by ICU > > * pg_perm_setlocale() must get a special case to handle ICU collations > > * the local handling code in pgperl must be modified (when using a ICU collation as default collation, we must decide whatcollation to send to perl) > > * convert_string_datum() in selfuncs.c could be patched to use ICU instead of strxfrm. However, as far as I understand,this is not absolutely required as this is only used by the query planner and would in the worst case preventsome optimisation in corner cases > > These changes would probably have an even bigger impact, because then people would no longer be limited to the collationssupported by the locales installed on their OS. > > NEXT STEPS: Collation versioning in indices > =========================================== > > Since ICU provides reliable versioning of collations, this would allow us to finally prevent index corruption caused bychanging implementations of strcoll. I haven't looked at this in detail, but I assume that this would be a small changewith potentially big impact. > > Ideally, PostgreSQL would detect when the collation is a different version than the one used to create the index, and stopusing the index until it is rebuilt. > > > I'll take a shot at the MINIMUM TODO as outlined above. > > > We've already included ICU support in our Postgres Plus Advanced Server product. Before you spend too much time on this,give me a few days to see if we can get that change contributed back. The people I need to speak to are OOO for Thanksgivingat the moment though, so it may be a few days. > > -- Hi, Just poking this old thread again. What happened here, is anyone putting work into this area at the moment? Palle
pgsql-hackers by date: