Re: benchmark results comparing versions 15.2 and 16 - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: benchmark results comparing versions 15.2 and 16
Date
Msg-id CAH2-WznuVEn1BNqGZerKtertgAM4y4HRaxjceDNk71Kbjh1McQ@mail.gmail.com
Whole thread Raw
In response to Re: benchmark results comparing versions 15.2 and 16  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: benchmark results comparing versions 15.2 and 16
List pgsql-hackers
On Sun, May 28, 2023 at 2:42 PM David Rowley <dgrowleyml@gmail.com> wrote:
> c6e0fe1f2 might have helped improve some of that performance, but I
> suspect there must be something else as ~3x seems much more than I'd
> expect from reducing the memory overheads.  Testing versions before
> and after that commit might give a better indication.

I'm virtually certain that this is due to the change in default
collation provider, from libc to ICU. Mostly due to the fact that ICU
is capable of using abbreviated keys, and  the system libc isn't
(unless you go out of your way to define TRUST_STRXFRM when building
Postgres).

Many individual test cases involving larger non-C collation text sorts
showed similar improvements back when I worked on this. Offhand, I
believe that 3x - 3.5x improvements in execution times were common
with high entropy abbreviated keys on high cardinality input columns
at that time (this was with glibc). Low cardinality inputs were more
like 2.5x.

I believe that ICU is faster than glibc in general -- even with
TRUST_STRXFRM enabled. But the TRUST_STRXFRM thing is bound to be the
most important factor here, by far.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: PG 16 draft release notes ready
Next
From: Peter Geoghegan
Date:
Subject: Re: abi-compliance-checker