Do you want me to try PG 16 without ICU or PG 15 with ICU? I can do that, but it will take a few days before the server is available.
On Mon, May 29, 2023 at 9:55 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Sun, May 28, 2023 at 2:42 PM David Rowley <dgrowleyml@gmail.com> wrote: > c6e0fe1f2 might have helped improve some of that performance, but I > suspect there must be something else as ~3x seems much more than I'd > expect from reducing the memory overheads. Testing versions before > and after that commit might give a better indication.
I'm virtually certain that this is due to the change in default collation provider, from libc to ICU. Mostly due to the fact that ICU is capable of using abbreviated keys, and the system libc isn't (unless you go out of your way to define TRUST_STRXFRM when building Postgres).
Many individual test cases involving larger non-C collation text sorts showed similar improvements back when I worked on this. Offhand, I believe that 3x - 3.5x improvements in execution times were common with high entropy abbreviated keys on high cardinality input columns at that time (this was with glibc). Low cardinality inputs were more like 2.5x.
I believe that ICU is faster than glibc in general -- even with TRUST_STRXFRM enabled. But the TRUST_STRXFRM thing is bound to be the most important factor here, by far.