Thread: [GENERAL] PG-10 + ICU and abbreviated keys

[GENERAL] PG-10 + ICU and abbreviated keys

From
Andreas Joseph Krogh
Date:
Hi.
 
In PG-10, with ICU enabled, is abbreviated keys now enabled?
 
If so, using locale=nb_NO.UTF-8, do I have to use a ICU-specific locale to take advantage of abbreviated keys?
 
Thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963

Re: [GENERAL] PG-10 + ICU and abbreviated keys

From
Adrien Nayrat
Date:
On 11/13/2017 09:27 AM, Andreas Joseph Krogh wrote:
>  
> In PG-10, with ICU enabled, is abbreviated keys now enabled?
>  

Hello,


I think yes :

src/backend/utils/adt/varlena.c

1876     /*
1877      * Unfortunately, it seems that abbreviation for non-C collations is
1878      * broken on many common platforms; testing of multiple versions of glibc
1879      * reveals that, for many locales, strcoll() and strxfrm() do not return
1880      * consistent results, which is fatal to this optimization.  While no
1881      * other libc other than Cygwin has so far been shown to have a problem,
1882      * we take the conservative course of action for right now and disable
1883      * this categorically.  (Users who are certain this isn't a problem on
1884      * their system can define TRUST_STRXFRM.)
1885      *
1886      * Even apart from the risk of broken locales, it's possible that there
1887      * are platforms where the use of abbreviated keys should be disabled at
1888      * compile time.  Having only 4 byte datums could make worst-case
1889      * performance drastically more likely, for example.  Moreover, macOS's
1890      * strxfrm() implementation is known to not effectively concentrate a
1891      * significant amount of entropy from the original string in earlier
1892      * transformed blobs.  It's possible that other supported platforms are
1893      * similarly encumbered.  So, if we ever get past disabling this
1894      * categorically, we may still want or need to disable it for particular
1895      * platforms.
1896      */
1897 #ifndef TRUST_STRXFRM
1898     if (!collate_c && !(locale && locale->provider == COLLPROVIDER_ICU))
1899         abbreviate = false;
1900 #endif


But I did not do any test to compare performances.

Regards,

--
Adrien NAYRAT

http://dalibo.com - http://dalibo.org


Re: [GENERAL] PG-10 + ICU and abbreviated keys

From
Peter Geoghegan
Date:
On Mon, Nov 13, 2017 at 12:27 AM, Andreas Joseph Krogh
<andreas@visena.com> wrote:
> In PG-10, with ICU enabled, is abbreviated keys now enabled?

Yes. ICU will use abbreviated keys on every platform, including Windows.

> If so, using locale=nb_NO.UTF-8, do I have to use a ICU-specific locale to take advantage of abbreviated keys?

You need to use an ICU collation. It must be a per-column collation,
as you cannot currently use ICU for an entire database. (This
limitation should be removed in the next release or two.)

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PG-10 + ICU and abbreviated keys

From
Andreas Joseph Krogh
Date:
På mandag 13. november 2017 kl. 19:07:53, skrev Peter Geoghegan <pg@bowt.ie>:
On Mon, Nov 13, 2017 at 12:27 AM, Andreas Joseph Krogh
<andreas@visena.com> wrote:
> In PG-10, with ICU enabled, is abbreviated keys now enabled?

Yes. ICU will use abbreviated keys on every platform, including Windows.

> If so, using locale=nb_NO.UTF-8, do I have to use a ICU-specific locale to take advantage of abbreviated keys?

You need to use an ICU collation. It must be a per-column collation,
as you cannot currently use ICU for an entire database. (This
limitation should be removed in the next release or two.)
 
Ok, so I have to explicitly specify like this:
 
create table test(id serial primary key, name varchar collate "nb_NO" not null);
 
Will ICU be used here as long as PG is compiled with ICU-suppoert, as the debian-packages are, or do I have to specify collation-provider?
 
Do I have to explicitly specify collation when using ORDER by on that column for index and abbreviated keys to be used?
 
Thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 

Re: [GENERAL] PG-10 + ICU and abbreviated keys

From
Peter Geoghegan
Date:
On Mon, Nov 13, 2017 at 11:48 AM, Andreas Joseph Krogh
<andreas@visena.com> wrote:
> Ok, so I have to explicitly specify like this:
>
> create table test(id serial primary key, name varchar collate "nb_NO" not null);

That doesn't look like an ICU locale. You may mean "nb-NO-x-icu". But
otherwise, yes.

> Will ICU be used here as long as PG is compiled with ICU-suppoert, as the debian-packages are, or do I have to
specifycollation-provider?
 

If you did initdb with a version with ICU support, the ICU collations
should be there.

> Do I have to explicitly specify collation when using ORDER by on that column for index and abbreviated keys to be
used?

Only if you didn't define the column with a per-column collation initially.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PG-10 + ICU and abbreviated keys

From
Peter Geoghegan
Date:
On Mon, Nov 13, 2017 at 12:40 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> Do I have to explicitly specify collation when using ORDER by on that column for index and abbreviated keys to be
used?
>
> Only if you didn't define the column with a per-column collation initially.

BTW, if you specifically want to quickly verify whether or not
abbreviated keys were used, you can do that by setting "trace_sort =
on", and possibly setting "client_min_messages = LOG", too.

There should be quite a bit of debug output from that that
specifically mentions abbreviated keys.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PG-10 + ICU and abbreviated keys

From
Andreas Joseph Krogh
Date:
På mandag 13. november 2017 kl. 21:40:47, skrev Peter Geoghegan <pg@bowt.ie>:
On Mon, Nov 13, 2017 at 11:48 AM, Andreas Joseph Krogh
<andreas@visena.com> wrote:
> Ok, so I have to explicitly specify like this:
>
> create table test(id serial primary key, name varchar collate "nb_NO" not null);

That doesn't look like an ICU locale. You may mean "nb-NO-x-icu". But
otherwise, yes.
 
 
Ok, is there a way I can get a list of ICU-collations?
 
 
> Will ICU be used here as long as PG is compiled with ICU-suppoert, as the debian-packages are, or do I have to specify collation-provider?

If you did initdb with a version with ICU support, the ICU collations
should be there.

> Do I have to explicitly specify collation when using ORDER by on that column for index and abbreviated keys to be used?

Only if you didn't define the column with a per-column collation initially.
 
Ok, thanks.
 
Looking forward to this being a per-database setting so it's (hopefully) more transparent.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 

Re: [GENERAL] PG-10 + ICU and abbreviated keys

From
Andreas Joseph Krogh
Date:
På mandag 13. november 2017 kl. 21:46:08, skrev Peter Geoghegan <pg@bowt.ie>:
On Mon, Nov 13, 2017 at 12:40 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> Do I have to explicitly specify collation when using ORDER by on that column for index and abbreviated keys to be used?
>
> Only if you didn't define the column with a per-column collation initially.

BTW, if you specifically want to quickly verify whether or not
abbreviated keys were used, you can do that by setting "trace_sort =
on", and possibly setting "client_min_messages = LOG", too.

There should be quite a bit of debug output from that that
specifically mentions abbreviated keys.
 
Thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 

Re: [GENERAL] PG-10 + ICU and abbreviated keys

From
Peter Geoghegan
Date:
On Mon, Nov 13, 2017 at 12:48 PM, Andreas Joseph Krogh
<andreas@visena.com> wrote:
> Thanks.

As the person that worked on abbreviated keys, I'd like to hear about
how you get with this. How much faster is it for you?

I don't usually get to hear about this, because most users don't
notice that anything in particular gets faster, because there are many
performance enhancements added to a release.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PG-10 + ICU and abbreviated keys

From
Andreas Joseph Krogh
Date:
På mandag 13. november 2017 kl. 22:28:40, skrev Peter Geoghegan <pg@bowt.ie>:
On Mon, Nov 13, 2017 at 12:48 PM, Andreas Joseph Krogh
<andreas@visena.com> wrote:
> Thanks.

As the person that worked on abbreviated keys, I'd like to hear about
how you get with this. How much faster is it for you?

I don't usually get to hear about this, because most users don't
notice that anything in particular gets faster, because there are many
performance enhancements added to a release.
 
We haven't migrated any of our databases to v10 yet so I really can't tell. I'm evaluating ICU-usage as the last step before we decide moving to v10. Being a per-column setting that means a pg_dump/reload won't cut it (AFAIU), so I'm not sure we'll take that route as it involves much manual tweaking which we're really not interessted in spending time on.
 
When sorting on text, we're usually doing so using an multi-column index, like for instance "CREATE INDEX xxx ON my_table (lower(name) ASC, created ASC)". Will abbreviated keys help here?
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 

Re: [GENERAL] PG-10 + ICU and abbreviated keys

From
Peter Geoghegan
Date:
On Mon, Nov 13, 2017 at 2:05 PM, Andreas Joseph Krogh
<andreas@visena.com> wrote:
> When sorting on text, we're usually doing so using an multi-column index, like for instance "CREATE INDEX xxx ON
my_table(lower(name) ASC, created ASC)". Will abbreviated keys help here?
 

Yes, they'll help with that, even though the leading column might be
low cardinality.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PG-10 + ICU and abbreviated keys

From
Andreas Joseph Krogh
Date:
På tirsdag 14. november 2017 kl. 00:44:11, skrev Peter Geoghegan <pg@bowt.ie>:
On Mon, Nov 13, 2017 at 2:05 PM, Andreas Joseph Krogh
<andreas@visena.com> wrote:
> When sorting on text, we're usually doing so using an multi-column index, like for instance "CREATE INDEX xxx ON my_table (lower(name) ASC, created ASC)". Will abbreviated keys help here?

Yes, they'll help with that, even though the leading column might be
low cardinality.
 
Nice to know, thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963