Thread: [HACKERS] Can ICU be used for a database's default sort order?

[HACKERS] Can ICU be used for a database's default sort order?

From

Tom Lane

Date:

23 June 2017, 05:10:54

I tried to arrange $subject via

create database icu encoding 'utf8' lc_ctype "en-US-x-icu" lc_collate "en-US-x-icu" template template0;

and got only

ERROR:  invalid locale name: "en-US-x-icu"

which is unsurprising after looking into the code, because createdb()
checks those parameters with check_locale() which only knows about
libc-defined locale names.

Is there some way I'm missing, or is this just a not-done-yet feature?
        regards, tom lane

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Peter Geoghegan

Date:

23 June 2017, 06:10:38

On Thu, Jun 22, 2017 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Is there some way I'm missing, or is this just a not-done-yet feature?

It's a not-done-yet feature.


-- 
Peter Geoghegan

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Peter Eisentraut

Date:

23 June 2017, 21:32:38

On 6/22/17 23:10, Peter Geoghegan wrote:
> On Thu, Jun 22, 2017 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Is there some way I'm missing, or is this just a not-done-yet feature?
> 
> It's a not-done-yet feature.

It's something I hope to address soon.

The main definitional challenge is how to associate a pg_database entry
with a collation.

What we currently effectively do is duplicate the fields of pg_collation
in pg_database.  But I imagine over time we'll add more properties in
pg_collation, along with additional ALTER COLLATION commands etc., so
duplicating all of that would be a significant amount of code
complication and result in a puzzling user interface.

Ideally, I'd like to see CREATE DATABASE ... COLLATION "foo".  But the
problem is of course that collations are per-database objects.  Possible
solutions:

1) Associate by name only.  That is, you can create a database with any
COLLATION "foo" that you want, and it's only checked when you first
connect to or do anything in the database.

2) Create shared collations.  Then we'd need a way to manage having a
mix of shared and non-shared collations around.

There are significant pros and cons to all of these ideas.  Some people
I talked to appeared to prefer the shared collations approach.

Other ideas?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Peter Geoghegan

Date:

24 June 2017, 01:54:36

On Fri, Jun 23, 2017 at 11:32 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> It's something I hope to address soon.

I hope you do. I think that we'd realize significant benefits by
having ICU become the defacto standard collation provider, that most
users get without even realizing it. As things stand, you have to make
a point of specifying an ICU collation as your per-column collation
within every CREATE TABLE. That's a significant barrier to adoption.

> 1) Associate by name only.  That is, you can create a database with any
> COLLATION "foo" that you want, and it's only checked when you first
> connect to or do anything in the database.
>
> 2) Create shared collations.  Then we'd need a way to manage having a
> mix of shared and non-shared collations around.
>
> There are significant pros and cons to all of these ideas.  Some people
> I talked to appeared to prefer the shared collations approach.

I strongly prefer the second approach. The only downside that occurs
to me is that that approach requires more code. Is there something
that I've missed?

-- 
Peter Geoghegan

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Tom Lane

Date:

24 June 2017, 02:14:13

Peter Geoghegan <pg@bowt.ie> writes:
> On Fri, Jun 23, 2017 at 11:32 AM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> 1) Associate by name only.  That is, you can create a database with any
>> COLLATION "foo" that you want, and it's only checked when you first
>> connect to or do anything in the database.
>> 
>> 2) Create shared collations.  Then we'd need a way to manage having a
>> mix of shared and non-shared collations around.
>> 
>> There are significant pros and cons to all of these ideas.  Some people
>> I talked to appeared to prefer the shared collations approach.

> I strongly prefer the second approach. The only downside that occurs
> to me is that that approach requires more code. Is there something
> that I've missed?

I'm not very clear on how you'd bootstrap template1 into anything
other than C locale in the second approach.  With our existing
libc-based stuff, it's possible to define what the database's locale
is before there are any catalogs.  It's not apparent how to do that with
a collation-based solution.

In my mind, collations are just a SQL-syntax wrapper for locales that
are really defined one level down.  I think we'd be well advised to
carry that same approach into the database properties, because otherwise
we have circularities to deal with. So I'm imagining something more like

create database encoding 'utf8' lc_collate 'icu-en_US' lc_ctype ...

where lc_collate is just a string that we know how to interpret, the
same as now.

We could optionally reduce the amount of notation involved by merging the
lc_collate and lc_ctype parameters into one, say

create database encoding 'utf8' locale 'icu-en_US' ...

I'm not too clear on how this would play with other libc locale
functionality (lc_monetary and so on), but we'd have to deal with
that question anyway.
        regards, tom lane

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Vladimir Borodin

Date:

31 January 2018, 19:48:28

Hi.

23 июня 2017 г., в 21:32, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а):

On 6/22/17 23:10, Peter Geoghegan wrote:
On Thu, Jun 22, 2017 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Is there some way I'm missing, or is this just a not-done-yet feature?

It's a not-done-yet feature.

It's something I hope to address soon.

Will it work only for a particular database? Or for a whole cluster during initdb also? Any chance to get this done in 11?

--
May the force be with you…
https://simply.name

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Peter Eisentraut

Date:

01 February 2018, 17:09:34

On 1/31/18 11:48, Vladimir Borodin wrote:
> Will it work only for a particular database? Or for a whole cluster
> during initdb also? Any chance to get this done in 11?

I'm currently not working on it.

It's basically just a lot of leg work, and you need to come up with a
catalog representation.  Possible options have already been addressed in
earlier threads.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Andrey Borodin

Date:

02 February 2018, 15:22:07

Hello!

> 1 февр. 2018 г., в 19:09, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а):
>
> On 1/31/18 11:48, Vladimir Borodin wrote:
>> Will it work only for a particular database? Or for a whole cluster
>> during initdb also? Any chance to get this done in 11?
>
> I'm currently not working on it.
>
> It's basically just a lot of leg work, and you need to come up with a
> catalog representation.  Possible options have already been addressed in
> earlier threads.


I can try to do this before next CF. But ISTM that EDB and Postgres Pro already have various flavours of similar
feature.Maybe they are planning to publish that? 

Best regards, Andrey Borodin.

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Peter Geoghegan

Date:

02 February 2018, 19:14:19

On Fri, Feb 2, 2018 at 4:22 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> I can try to do this before next CF. But ISTM that EDB and Postgres Pro already have various flavours of similar
feature.Maybe they are planning to publish that?
 

I would definitely review that patch.


-- 
Peter Geoghegan

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Andrey Borodin

Date:

10 February 2018, 18:45:40

Hi!

> 2 февр. 2018 г., в 21:14, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> On Fri, Feb 2, 2018 at 4:22 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>> I can try to do this before next CF. But ISTM that EDB and Postgres Pro already have various flavours of similar
feature.Maybe they are planning to publish that? 
>
> I would definitely review that patch.

I've contacted Postgres Professional. Marina Polyakova had kindly provided their patch.
The patch allows to use libc locale with ICU collation as default for cluster or database.

It seems that this patch brings important long-awaited feature and deserves to be included in last v11 commitfest.
Peter, everyone, do you agree with this? Or should we better adapt this work through v12 cycle?

I'm planning to provide review asap and do necessary changes if required (this was discussed with Marina and Postgres
Professional).

Best regards, Andrey Borodin.

Attachment

0001-ICU-as-default-collation-provider.txt

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Andrey Borodin

Date:

16 February 2018, 11:12:39

Hi everyone!

> 10 февр. 2018 г., в 20:45, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
> I'm planning to provide review
>

So, I was looking into the patch.
The patch adds:
1. Ability to specify collation provider (with version) in --locale for initdb and createdb.
2. Changes to locale checks
3. Sets ICU as default collation provider. For example "ru_RU@icu.153.80.32.1" is default on my machine with patch
4. Tests and necessary changes to documentation

With patch I get correct ICU ordering by default
postgres=# select unnest(array['е','ё','ж']) order by 1;
 unnest
--------
 е
 ё
 ж
(3 rows)

While libc locale provides incorrect order (I also get same ordering by default without patch)

postgres=# select c from unnest(array['е','ё','ж']) c order by c collate "ru_RU";
 c
---
 е
 ж
 ё
(3 rows)


Unfortunately, neither "ru_RU@icu.153.80.32.1" (exposed by LC_COLLATE and other places) nor "ru_RU@icu" cannot be used
bycollate SQL clause. 
Also, patch removes compatibility with MSVC 1800 (Visual Studio 2013) on Windows XP and Windows Server 2003. This is
doneto use newer locale-related functions in VS2013 build. 

If the database was initialized with default locale without this patch, one cannot connect to it anymore
psql: FATAL:  could not find out the collation provider for datcollate "ru_RU.UTF-8" of database "postgres"
This problem is mentioned in commit message of the patch. I think that this problem should be addressed somehow.
What do you think?

Overall patch looks solid and thoughtful work and adds important functionality.

Best regards, Andrey Borodin.

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Michail Nikolaev

Date:

18 February 2018, 16:43:20

Hello.

Just want to inform:

I have run check,installcheck,plcheck,contribcheck,modulescheck,ecpgcheck,isolationcheck,upgradecheck tests on Windows 10, VC2017 with patch applied on top of 2a41507dab0f293ff241fe8ae326065998668af8 as Andrey asked me.

Everything is passing with and without $config->{icu} = 'D:\Dev\postgres\icu\';

Best regards,

Michail.

пт, 16 февр. 2018 г. в 11:13, Andrey Borodin <x4mmm@yandex-team.ru>:

Hi everyone!

> 10 февр. 2018 г., в 20:45, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
> I'm planning to provide review
>

So, I was looking into the patch.
The patch adds:
1. Ability to specify collation provider (with version) in --locale for initdb and createdb.
2. Changes to locale checks
3. Sets ICU as default collation provider. For example "ru_RU@icu.153.80.32.1" is default on my machine with patch
4. Tests and necessary changes to documentation

With patch I get correct ICU ordering by default
postgres=# select unnest(array['е','ё','ж']) order by 1;
unnest
--------
е
ё
ж
(3 rows)

While libc locale provides incorrect order (I also get same ordering by default without patch)

postgres=# select c from unnest(array['е','ё','ж']) c order by c collate "ru_RU";
c
---
е
ж
ё
(3 rows)

Unfortunately, neither "ru_RU@icu.153.80.32.1" (exposed by LC_COLLATE and other places) nor "ru_RU@icu" cannot be used by collate SQL clause.
Also, patch removes compatibility with MSVC 1800 (Visual Studio 2013) on Windows XP and Windows Server 2003. This is done to use newer locale-related functions in VS2013 build.

If the database was initialized with default locale without this patch, one cannot connect to it anymore
psql: FATAL: could not find out the collation provider for datcollate "ru_RU.UTF-8" of database "postgres"
This problem is mentioned in commit message of the patch. I think that this problem should be addressed somehow.
What do you think?

Overall patch looks solid and thoughtful work and adds important functionality.

Best regards, Andrey Borodin.

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Andres Freund

Date:

02 March 2018, 09:14:03

Hi,

On 2018-02-10 20:45:40 +0500, Andrey Borodin wrote:
> I've contacted Postgres Professional. Marina Polyakova had kindly provided their patch.
> The patch allows to use libc locale with ICU collation as default for cluster or database.
> 
> It seems that this patch brings important long-awaited feature and deserves to be included in last v11 commitfest.
> Peter, everyone, do you agree with this? Or should we better adapt this work through v12 cycle?
> 
> I'm planning to provide review asap and do necessary changes if required (this was discussed with Marina and Postgres
Professional).

This patch was submitted for the last v11 commitfest, it's not a trivial
patch, and hasn't yet been reviewed. I'm afraid the policy is that large
patches shouldn't be submitted for the last commitfest...  Thus I think
this should be moved to the next one.

Greetings,

Andres Freund

Re: Re: [HACKERS] Can ICU be used for a database's default sortorder?

From

David Steele

Date:

06 March 2018, 17:54:33

On 3/2/18 1:14 AM, Andres Freund wrote:
> 
> On 2018-02-10 20:45:40 +0500, Andrey Borodin wrote:
>> I've contacted Postgres Professional. Marina Polyakova had kindly provided their patch.
>> The patch allows to use libc locale with ICU collation as default for cluster or database.
>>
>> It seems that this patch brings important long-awaited feature and deserves to be included in last v11 commitfest.
>> Peter, everyone, do you agree with this? Or should we better adapt this work through v12 cycle?
>>
>> I'm planning to provide review asap and do necessary changes if required (this was discussed with Marina and
PostgresProfessional).
 
> 
> This patch was submitted for the last v11 commitfest, it's not a trivial
> patch, and hasn't yet been reviewed. I'm afraid the policy is that large
> patches shouldn't be submitted for the last commitfest...  Thus I think
> this should be moved to the next one.

This patch has been moved to the next CF.

Regards,
-- 
-David
david@pgmasters.net

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

"Daniel Verite"

Date:

01 August 2018, 16:40:14

    Andrey Borodin wrote:

> Overall patch looks solid and thoughtful work and adds important
> functionality.

I tried the patch, with some minor changes to build with HEAD.

I was surprised by the interface, that is, the fact that a user is
not allowed to freely choose the ICU collation of a database, in
constrast with CREATE COLLATION.

AFAIU, when the "default collation provider" is ICU, CREATE DATABASE
still expects a libc locale in the lc_collate/lc_ctype arguments.
The code will automatically find an ICU equivalent by matching the
language, and it seems that the country is ignored?

So if we wanted a database with an ICU collation like, say,
"es@collation=traditional" or "es-u-co-trad" as expressed with a
BCP-47 tag, or anything that is not defined by only a language,
would it be possible? I have the impression it wouldn't.

This is not something that could be easily improved after the fact
because getting the ICU collation through a libc collation is a
user-interface choice.

I think users would rather be able to create a database with
something like:

CREATE DATABASE foo
  COLLPROVIDER='icu'
  LOCALE='icu_locale' |
      LC_COLLATE='icu_locale' | LC_CTYPE='icu_locale'
  ...
which would be in line with CREATE COLLATION.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

"Daniel Verite"

Date:

02 August 2018, 14:37:17

    I wrote:

> I tried the patch, with some minor changes to build with HEAD.

PFA a rebased version.
(for some reason http://cfbot.cputube.org/ did not pick up the initial patch;
there's an entry for it but the rightmost column is empty,
no link to patch and no green/red success/failure icons.
It seems it's the only entry in this state).


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Attachment

ICU-as-default-collation-provider-rebased-jul-2018.patch

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Thomas Munro

Date:

12 September 2018, 04:06:12

On Sat, Jun 24, 2017 at 10:55 AM Peter Geoghegan <pg@bowt.ie> wrote:
> On Fri, Jun 23, 2017 at 11:32 AM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > It's something I hope to address soon.
>
> I hope you do. I think that we'd realize significant benefits by
> having ICU become the defacto standard collation provider, that most
> users get without even realizing it. As things stand, you have to make
> a point of specifying an ICU collation as your per-column collation
> within every CREATE TABLE. That's a significant barrier to adoption.
>
> > 1) Associate by name only.  That is, you can create a database with any
> > COLLATION "foo" that you want, and it's only checked when you first
> > connect to or do anything in the database.
> >
> > 2) Create shared collations.  Then we'd need a way to manage having a
> > mix of shared and non-shared collations around.
> >
> > There are significant pros and cons to all of these ideas.  Some people
> > I talked to appeared to prefer the shared collations approach.
>
> I strongly prefer the second approach. The only downside that occurs
> to me is that that approach requires more code. Is there something
> that I've missed?

Sorry to join this thread late.  I was redirected here from another one[1].

I like the shared catalog idea, but here's one objection I thought
about: it makes it a bit harder to track whether you've sorted out all
your indexes after a version change.  Say collation fr_CA's version
changes according to the provider, so that it no longer matches the
stored collversion.  Now you'll need to be careful to connect to every
database in the cluster and run REINDEX, before you run ALTER
COLLATION "fr_CA" REFRESH VERSION to update the single shared
pg_collation row's collversion.  With the non-shared pg_collation
scheme we have currently, you'd need to refresh the collation row in
each database after reindexing the whole database, which is IMHO a bit
nicer (you track which databases you've dealt with as you go through
them).

In other words, using a shared catalog moves the "scope" of the
version tracking even further away from the ideal scope, and requires
humans to actually get the cleanup right, and it's extra confusing
because you can only be connected to one database at a time so there
is no "REINDEX MY CLUSTER" and no possibility of making a command that
reindexes dependent indexes and then refreshes the collation version.

The ideal scope would be to track all referenced collation versions on
every index, and only update them at CREATE INDEX or REINDEX time
(also, as discussed in some other thread, CHECK constraints and
partition keys might be invalidated and should in theory also carry
versions that can only be updated by running a hypothetical RECHECK or
REPARTITION command).  Then a shared pg_collation catalog would make
perfect sense, and there would be no need for it to have a collversion
column at all, or an ALTER COLLATION ... REFRESH VERSION command, and
therefore there would be no way to screw it up by REFRESHing the
VERSION without having really fixed the problem.

[1] https://www.postgresql.org/message-id/242e081c-aec8-a20a-510c-f4d0f183cebd%402ndquadrant.com

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Michael Paquier

Date:

02 October 2018, 09:37:06

On Wed, Sep 12, 2018 at 01:06:12PM +1200, Thomas Munro wrote:
> The ideal scope would be to track all referenced collation versions on
> every index, and only update them at CREATE INDEX or REINDEX time
> (also, as discussed in some other thread, CHECK constraints and
> partition keys might be invalidated and should in theory also carry
> versions that can only be updated by running a hypothetical RECHECK or
> REPARTITION command).  Then a shared pg_collation catalog would make
> perfect sense, and there would be no need for it to have a collversion
> column at all, or an ALTER COLLATION ... REFRESH VERSION command, and
> therefore there would be no way to screw it up by REFRESHing the
> VERSION without having really fixed the problem.

Please note that the latest patch set does not apply, so this has been
switched to commit fest 2018-11, waiting on author for a rebase.
--
Michael

Attachment

signature.asc

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Andrey Borodin

Date:

30 October 2018, 08:06:27

Hi!

> 2 окт. 2018 г., в 11:37, Michael Paquier <michael@paquier.xyz> написал(а):
>
> Please note that the latest patch set does not apply, so this has been
> switched to commit fest 2018-11, waiting on author for a rebase.

PFA rebased version. I've added LDFLAGS_INTERNAL += $(ICU_LIBS) in libpq, but I'm not entirely sure this is correct way
todeal with complaints on ICU functions from libpq linking. 


Best regards, Andrey Borodin.

Attachment

0001-ICU-as-a-default-collation-provider-rebased-oct-2018.patch

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Dmitry Dolgov

Date:

30 November 2018, 13:51:27

> On Tue, Oct 30, 2018 at 9:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> Hi!
>
> > 2 окт. 2018 г., в 11:37, Michael Paquier <michael@paquier.xyz> написал(а):
> >
> > Please note that the latest patch set does not apply, so this has been
> > switched to commit fest 2018-11, waiting on author for a rebase.
>
> PFA rebased version. I've added LDFLAGS_INTERNAL += $(ICU_LIBS) in libpq,

Thanks for providing the rebased version.

> but I'm not entirely sure this is correct way to deal with complaints on ICU
> functions from libpq linking.

Well, it was enough on my own Gentoo, where patch actually compiles without
errors and pass "make check". But for cfbot it doesn't compile, I'm not sure
why.

../../../src/interfaces/libpq/libpq.so: undefined reference to `ucol_open_52'
../../../src/interfaces/libpq/libpq.so: undefined reference to
`uloc_toLanguageTag_52'
../../../src/interfaces/libpq/libpq.so: undefined reference to `u_errorName_52'
../../../src/interfaces/libpq/libpq.so: undefined reference to
`ucol_getVersion_52'
../../../src/interfaces/libpq/libpq.so: undefined reference to
`u_versionToString_52'
../../../src/interfaces/libpq/libpq.so: undefined reference to
`uloc_setDefault_52'
../../../src/interfaces/libpq/libpq.so: undefined reference to
`uloc_getDefault_52'
../../../src/interfaces/libpq/libpq.so: undefined reference to `ucol_close_52'

As a side note, I'm a bit confused, who is the original author of the proposed
patch? If it's Marina, why she isn't involved in the discussion or even
mentioned in the patch itself?

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

"Daniel Verite"

Date:

01 December 2018, 20:08:19

    Dmitry Dolgov wrote:

> As a side note, I'm a bit confused, who is the original author of
> the proposed patch? If it's Marina, why she isn't involved in the
> discussion or even mentioned in the patch itself?

The original patch [1] starts with these commit metadata:

  From e1cb130f550952d9c9c2d9ad1c52e60699a2c968 Mon Sep 17 00:00:00 2001
  From: Marina Polyakova <m.polyakova@postgrespro.ru>
  Date: Fri, 9 Feb 2018 18:57:25 +0300
  Subject: [PATCH] ICU as default collation provider
  ... commit message...

but as you note, Marina did not intervene in the discussion nor
submitted it herself, so this patch misses someone to play the
role of the author in the CF process.

There were reviews: Andrey Borodin raised issues with the patch in
[2], I spent some time trying it and asked questions about the design
in [3], but no one followed up on them within the next months.

About the status of the patch, to me it should be RWF. It's been
moved to the next CF several times with no progress besides rebases.

[1] <37A534BE-CBF7-467C-B096-0AAD25091A9F@yandex-team.ru>
[2] <92826DEB-DA8F-4AE4-9C43-03A55D18A766@yandex-team.ru>
[3] <7e86a1aa-a942-4f6b-978e-e9013a4af3fb@manitou-mail.org>

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Dmitry Dolgov

Date:

02 December 2018, 12:22:37

> On Sat, Dec 1, 2018 at 9:08 PM Daniel Verite <daniel@manitou-mail.org> wrote:
>
>         Dmitry Dolgov wrote:
>
> > As a side note, I'm a bit confused, who is the original author of
> > the proposed patch? If it's Marina, why she isn't involved in the
> > discussion or even mentioned in the patch itself?
>
> The original patch [1] starts with these commit metadata:
>
>   From e1cb130f550952d9c9c2d9ad1c52e60699a2c968 Mon Sep 17 00:00:00 2001
>   From: Marina Polyakova <m.polyakova@postgrespro.ru>
>   Date: Fri, 9 Feb 2018 18:57:25 +0300
>   Subject: [PATCH] ICU as default collation provider
>   ... commit message...
>
> but as you note, Marina did not intervene in the discussion nor
> submitted it herself, so this patch misses someone to play the
> role of the author in the CF process.

Yes, I've missed that, thank you. But there was no references like that in the
last rebased version.

> There were reviews: Andrey Borodin raised issues with the patch in
> [2], I spent some time trying it and asked questions about the design
> in [3], but no one followed up on them within the next months.
>
> About the status of the patch, to me it should be RWF. It's been
> moved to the next CF several times with no progress besides rebases.

Let me disagree. Judging from the commentaries in this discussion it could be
significant and useful feature, and the author is trying to keep this patch
uptodate. The lack of reviews could be due other reasons than desirability of
the patch (as well as as for many other interesting proposals in hackers).

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Andrey Borodin

Date:

02 December 2018, 14:40:04

Hi, Dmitry, Daniel!

> 2 дек. 2018 г., в 17:22, Dmitry Dolgov <9erthalion6@gmail.com> написал(а):
>
>> There were reviews: Andrey Borodin raised issues with the patch in
>> [2], I spent some time trying it and asked questions about the design
>> in [3], but no one followed up on them within the next months.
>>
>> About the status of the patch, to me it should be RWF. It's been
>> moved to the next CF several times with no progress besides rebases.
>
> Let me disagree. Judging from the commentaries in this discussion it could be
> significant and useful feature, and the author is trying to keep this patch
> uptodate. The lack of reviews could be due other reasons than desirability of
> the patch (as well as as for many other interesting proposals in hackers).

Regarding status of this patch: Marina is the original author of the patch, but I'm interested in pushing it.
Basically,I've asked Marina to provide some code from Postgres Pro to discuss the feature. 

Daniel have raised important interface question in his review. Using libc-style locale in lc_collate is not a perfect
choicefor many ICU-only collations. 
I'd work on patch if I knew how to improve the interface, but I need input from community: how this interface should
looklike. 

I have intention to provide some solution for this question before next CF, but found no enough time during this CF
(besiderebase). 

Best regards, Andrey Borodin.

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Peter Geoghegan

Date:

06 December 2018, 22:16:46

On Sun, Dec 2, 2018 at 4:21 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > About the status of the patch, to me it should be RWF. It's been
> > moved to the next CF several times with no progress besides rebases.
>
> Let me disagree. Judging from the commentaries in this discussion it could be
> significant and useful feature, and the author is trying to keep this patch
> uptodate. The lack of reviews could be due other reasons than desirability of
> the patch (as well as as for many other interesting proposals in hackers).

+1. I, for one, care about this feature. We're all very busy, but I
don't want to see it die.

-- 
Peter Geoghegan

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Peter Eisentraut

Date:

11 December 2018, 08:22:48

On 02/12/2018 15:40, Andrey Borodin wrote:
> Daniel have raised important interface question in his review. Using libc-style locale in lc_collate is not a perfect
choicefor many ICU-only collations.

> I'd work on patch if I knew how to improve the interface, but I need input from community: how this interface should
looklike.

Figuring out the interface is the hard part. Several options have been
discussed in this thread and earlier threads.

My current thinking is that we should add a datcollprovider column to
pg_database, and then store the ICU locale name in datcollate. So
mirror the columns of pg_collation in pg_database.

Another issue is that we'd need to carefully divide up the role of the
"default" collation and the "default" provider. The default collation
is the collation defined for the database, the default provider means to
use the libc non-locale_t enabled API functions. Right now these are
always the same, but if the database-global locale is ICU, then the
default collation would use the ICU provider.

My recently posted patch "Reorganize collation lookup time and place" is
meant to help reorganize the APIs to make this simpler. It doesn't have
all the answers yet, but I think it's a step in this direction.

If we have well-designed answers to these questions, I'd imagine that
the actual feature patch would be quite small. I was very surprised to
see how large this patch is and how much code is moves around without
much explanation. I don't think it's worth reviewing this patch any
further. It needs several steps back and some fundamental design and
refactoring work.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

"Daniel Verite"

Date:

12 December 2018, 14:57:50

    Peter Eisentraut wrote:

> Another issue is that we'd need to carefully divide up the role of the
> "default" collation and the "default" provider.  The default collation
> is the collation defined for the database, the default provider means to
> use the libc non-locale_t enabled API functions.  Right now these are
> always the same, but if the database-global locale is ICU, then the
> default collation would use the ICU provider.

I think one related issue that the patch works around by using a libc locale
as a proxy is knowing what to put into libc's LC_CTYPE and LC_COLLATE.
In fact I've been wondering if that's the main reason for the interface
implemented by the patch.

Otherwise, how should these env variables be initialized for ICU
databases?
For instance in the existing FTS code, lowerstr_with_len() in
tsearch/ts_locale.c calls tolower() or towlower() to fold a string to
lower case when normalizing lexemes. This requires LC_CTYPE to be set
to something compatible with the database encoding, at the very
least. Even if that code looks like it might need to be changed for
ICU anyway (or just to be collation-aware according to the TODO marks?),
what about comparable calls in extensions?

In the case that we don't touch libc's LC_COLLATE/LC_CTYPE in backends,
extension code would have them inherited from the postmaster? Does that
sound acceptable? If not, maybe ICU databases should have these as
settable options, in addition to their ICU locale?

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Peter Eisentraut

Date:

12 December 2018, 17:53:44

On 12/12/2018 15:57, Daniel Verite wrote:
> I think one related issue that the patch works around by using a libc locale
> as a proxy is knowing what to put into libc's LC_CTYPE and LC_COLLATE.
> In fact I've been wondering if that's the main reason for the interface
> implemented by the patch.

So it seems, but then it should be called out more clearly.

> Otherwise, how should these env variables be initialized for ICU
> databases?

I think when using ICU by default, then it should not matter because we
shouldn't be calling any libc functions that use those settings.  Maybe
there need to be some exceptions, but again we should call those out
more clearly.

We could set them to "C" for consistency perhaps.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Michael Paquier

Date:

04 February 2019, 06:21:46

On Tue, Dec 11, 2018 at 09:22:48AM +0100, Peter Eisentraut wrote:
> If we have well-designed answers to these questions, I'd imagine that
> the actual feature patch would be quite small.  I was very surprised to
> see how large this patch is and how much code is moves around without
> much explanation.  I don't think it's worth reviewing this patch any
> further.  It needs several steps back and some fundamental design and
> refactoring work.

Marked as returned with feedback.  This thread has stalled.
--
Michael

Attachment

signature.asc

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Marius Timmer

Date:

27 February 2019, 16:32:00

Hello Andrey,

we would like to see ICU collations become the default for entire
databases as well. Therefore we would also review the patch.
Unfortunately your Patch from late October does not apply on the current
master.

Besides of that I noticed the patch applies on master of October but
results in errors when compiling without "--with-icu" and executing
"make check-world":
> libpq.so: Warning: undefined reference to »get_collprovider_name«
> libpq.so: Warning: undefined reference to
»is_valid_nondefault_collprovider«
> libpq.so: Warning: undefined reference to »get_collprovider«

May be caused by your last modification:
> I've added LDFLAGS_INTERNAL += $(ICU_LIBS) in libpq, but I'm not
> entirely sure this is correct way to deal with complaints on ICU
> functions from libpq linking.


Best regards,

Marius Timmer




--
Westfälische Wilhelms-Universität Münster (WWU)
Zentrum für Informationsverarbeitung (ZIV)
Röntgenstraße 7-13
48149 Münster
+49 251 83 31158
marius.timmer@uni-muenster.de
https://www.uni-muenster.de/ZIV

Attachment

signature.asc

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Marius Timmer

Date:

21 March 2019, 17:46:32

Hello hackers,

as I mentioned three weeks ago the patch from October 2018 did not apply
on the master. In the meantime I rebased it. Additionally I fixed some
Makefiles because a few icu-libs were missing. Now this patch applies
and compiles successfully on my machine. After installing running "make
installcheck-world" results in some failures (for example "select"). I
will take a closer look at those failures and review the whole patch in
the next few days. I just wanted to avoid that you have to do the same
rebasing stuff. The new patch is attached to this mail.

Maybe this patch should be moved to the current commit fest to keep
track of it. What do you think?


Best regards,

Marius Timmer




--
Westfälische Wilhelms-Universität Münster (WWU)
Zentrum für Informationsverarbeitung (ZIV)
Röntgenstraße 7-13
Besucheradresse: Einsteinstraße 60 - Raum 101
48149 Münster
+49 251 83 31158
marius.timmer@uni-muenster.de
https://www.uni-muenster.de/ZIV

Attachment

Re: [HACKERS] Can ICU be used for a database's default sort order?

From

Peter Eisentraut

Date:

22 March 2019, 09:26:32

On 2019-03-21 18:46, Marius Timmer wrote:
> as I mentioned three weeks ago the patch from October 2018 did not apply
> on the master. In the meantime I rebased it. Additionally I fixed some
> Makefiles because a few icu-libs were missing. Now this patch applies
> and compiles successfully on my machine. After installing running "make
> installcheck-world" results in some failures (for example "select"). I
> will take a closer look at those failures and review the whole patch in
> the next few days. I just wanted to avoid that you have to do the same
> rebasing stuff. The new patch is attached to this mail.

As I said previously in this thread, this patch needs some fundamental
design work.  I don't think it's worth doing a code review on the patch
as it is.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services