Re: How can sort performance be so different - Mailing list pgsql-performance

From Peter Geoghegan
Subject Re: How can sort performance be so different
Date
Msg-id CAH2-Wz=t-Seb=vPx4yTTe0mNsF4xknxeu63s5s-He71pKiNAxA@mail.gmail.com
Whole thread Raw
In response to Re: How can sort performance be so different  (Bob Jolliffe <bobjolliffe@gmail.com>)
Responses Re: How can sort performance be so different  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-performance
On Wed, Feb 20, 2019 at 1:42 PM Bob Jolliffe <bobjolliffe@gmail.com> wrote:
> It seems not to be (completely) particular to the installation.
> Testing on different platforms we found variable speed difference
> between 100x and 1000x slower, but always a considerable order of
> magnitiude.  The very slow performance comes from sorting Lao
> characters using en_US.UTF-8 collation.

I knew that some collations were slower, generally for reasons that
make some sense. For example, I was aware that ICU's use of Japanese
standard JIS X 4061 is particularly complicated and expensive, but
produces the most useful possible result from the point of view of a
Japanese speaker. Apparently glibc does not use that algorithm, and so
offers less useful sort order (though it may actually be faster in
that particular case).

I suspect that the reasons why the Lao locale sorts so much slower may
also have something to do with the intrinsic cost of supporting more
complicated rules. However, it's such a ridiculously large difference
that it also seems likely that somebody was disinclined to go to the
effort of optimizing it. The ICU people found that to be a tractable
goal, but they may have had to work at it. I also have a vague notion
that there are special cases that are more or less only useful for
sorting French. These complicate the implementation of UCA style
algorithms.

I am only speculating, based on what I've heard about other cases --
perhaps this explanation is totally wrong. I know a lot more about
this stuff than most people on this mailing list, but I'm still far
from being an expert.

-- 
Peter Geoghegan


pgsql-performance by date:

Previous
From: Bob Jolliffe
Date:
Subject: Re: How can sort performance be so different
Next
From: Gunther
Date:
Subject: neither CPU nor IO bound, but throttled performance