Re: [PERFORM] 10x faster sort performance on Skylake CPU vs IvyBridge - Mailing list pgsql-performance

From Felix Geisendörfer
Subject Re: [PERFORM] 10x faster sort performance on Skylake CPU vs IvyBridge
Date
Msg-id 75CFB1A6-BC3E-4E3E-BC66-5BF7CB8E2C4F@felixge.de
Whole thread Raw
In response to Re: [PERFORM] 10x faster sort performance on Skylake CPU vs Ivy Bridge  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
> On Aug 25, 2017, at 17:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> =?utf-8?Q?Felix_Geisend=C3=B6rfer?= <felix@felixge.de> writes:
>> I recently came across a performance difference between two machines that surprised me:
>> ...
>> As you can see, Machine A spends 5889ms on the Sort Node vs 609ms on Machine B when looking at the "Exclusive" time
withexplain.depesz.com [3][4]. I.e. Machine B is ~10x faster at sorting than Machine B (for this particular query). 
>
> I doubt this is a hardware issue, it's more likely that you're comparing
> apples and oranges.  The first theory that springs to mind is that the
> sort keys are strings and you're using C locale on the faster machine but
> some non-C locale on the slower.  strcoll() is pretty darn expensive
> compared to strcmp() :-(

You're right, that seems to be it.

Machine A was using strcoll() (lc_collate=en_US.UTF-8)
Machine B was using strcmp() (lc_collate=C)

After switching Machine A to use lc_collate=C, I get:

CTE Scan on zulu  (cost=40673.620..40742.300 rows=3434 width=56) (actual time=1368.610..1368.698 rows=58 loops=1)
  CTE zulu
      ->  HashAggregate  (cost=40639.280..40673.620 rows=3434 width=56) (actual time=1368.607..1368.659 rows=58
loops=1)
              Group Key: mike.two, ((mike.golf)::text)
            ->  Unique  (cost=37656.690..40038.310 rows=34341 width=104) (actual time=958.493..1168.128 rows=298104
loops=1)
                  ->  Sort  (cost=37656.690..38450.560 rows=317549 width=104) (actual time=958.491..1055.635
rows=316982loops=1) 
                          Sort Key: mike.two, ((mike.lima)::text) COLLATE "papa", mike.echo DESC, mike.quebec
                          Sort Method: quicksort  Memory: 56834kB
                        ->  Seq Scan on mike  (cost=0.000..8638.080 rows=317549 width=104) (actual time=0.043..172.496
rows=316982loops=1) 
                                Filter: (golf five NOT NULL)
                                Rows Removed by Filter: 26426

So Machine A needs 883ms [1] for the sort vs 609ms [2] for Machine B. That's  ~1.4x faster which seems reasonable :).

Sorry for the delayed response, I didn't have access to machine B to confirm this right away.

>             regards, tom lane

This is my first post to a PostgreSQL mailing list, but I've been lurking
for a while. Thank you for taking the time for replying to e-mails such
as mine and all the work you've put into PostgreSQL over the years.
I'm deeply grateful.

> On Aug 25, 2017, at 17:43, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Fri, Aug 25, 2017 at 8:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I doubt this is a hardware issue, it's more likely that you're comparing
>> apples and oranges.  The first theory that springs to mind is that the
>> sort keys are strings and you're using C locale on the faster machine but
>> some non-C locale on the slower.  strcoll() is pretty darn expensive
>> compared to strcmp() :-(
>
> strcoll() is very noticeably slower on macOS, too.
>

Thanks. This immediately explains what I saw when testing this query on a Linux machine that was also using
lc_collate=en_US.UTF-8but not being slowed down by it as much as the macOS machine. 

[1] https://explain.depesz.com/s/LOqa
[2] https://explain.depesz.com/s/zVe

pgsql-performance by date:

Previous
From: Aniko Belim
Date:
Subject: Re: [PERFORM] Partitioned table - scans through every partitions
Next
From: Mariel Cherkassky
Date:
Subject: Re: [PERFORM] performance problem on big tables