Re: Slow performance of collate "en_US.utf8" - Mailing list pgsql-performance

From Joe Conway
Subject Re: Slow performance of collate "en_US.utf8"
Date
Msg-id 4ae34c31-b413-4b7e-91c3-63b9ae5da3c3@joeconway.com
Whole thread Raw
In response to Re: Slow performance of collate "en_US.utf8"  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-performance
On 2/28/25 17:49, Thomas Munro wrote:
> On Sat, Mar 1, 2025 at 9:03 AM Joe Conway <mail@joeconway.com> wrote:
>> On 2/28/25 09:16, Laurenz Albe wrote:
>> > On Thu, 2025-02-27 at 16:54 +0300, Alexey Borschev wrote:
>> >> I see poor performance of text sorting of collate "en_US.utf8" in PG 17.4.
>> >
>> > I'd say that you would have to complain to the authors of the
>> > GNU C library, which provides this collation.
>>
>> Yep -- glibc starting with version 2.21 has a massive performance
>> regression for certain cases and the glibc folks have basically said
>> they will not fix it. If you try the same thing on RHEL 7.x with glibc
>> 2.17 it will perform about the same as ICU.
> 
> I've idly wondered if this is the culprit, do you know?
> 
> https://github.com/bminor/glibc/commit/0742aef6e52a935f9ccd69594831b56d807feef3

Yes, that was definitely the one that caused the regression. Note that 
if you look closely you will find there is a revert of that patch on 
glibc on certain distros. But not on RHEL and RHEL-alike.

Someone else pointed out this thread to me:
https://sourceware.org/bugzilla/show_bug.cgi?id=18441

Note the last message on that thread:
8<--------------
  Carlos O'Donell 2019-05-09 20:44:56 UTC

(In reply to vectoroc from comment #13)
 > Hello. Is there any chance that the issues will be fixed? Unfortunately
 > PostgreSQL Is unable to use ICU some base features (e.g in analyze
 > operation).

We haven't had anyone working on strcoll_l performance improvements. So 
it's unlikely that this will get merged or reviewed any time soon.
8<--------------


-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-performance by date:

Previous
From: Thom Brown
Date:
Subject: Re: [PERFORM] Unused index influencing sequential scan plan
Next
From: Alexey Borschev
Date:
Subject: Slow performance of collate "en_US.utf8"