Re: [HACKERS] [PATCH] Incremental sort - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: [HACKERS] [PATCH] Incremental sort
Date
Msg-id CAPpHfdtHggE3gK1xytjk4QV1skXoeO2DHtcw37tBA5H+ew7N5Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [PATCH] Incremental sort  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: [HACKERS] [PATCH] Incremental sort  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Sat, Mar 10, 2018 at 6:42 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Thu, Mar 8, 2018 at 2:49 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Thank you very much for testing and benchmarking.  I'll investigate
the regressions you found.
 
Now, there's a caveat in those tests - the data set is synthetic and
perfectly random, i.e. all groups equally likely, no correlations or
anything like that.

I wonder what is the "worst case" scenario, i.e. how to construct a data
set with particularly bad behavior of the incremental sort.

I think that depends on the reason of bad behavior of incremental sort.
For example, our quick sort implementation behaves very good on
presorted data.  But, incremental sort appears to be not so good in
this case as Heikki showed upthread.  That prompted me to test
presorted datasets (which appeared to be "worst case") more intensively. 
But I suspect that regressions you found have another reason, and
correspondingly "worst case" would be also different.
When I'll investigate the reason of regressions, I'll try to construct
"worst case" as well.

After some investigation of benchmark results, I found 2 sources of
regressions of incremental sort.

Case 1: Underlying node scan lose is bigger than incremental sort win

===== 33 [Wed Mar  7 10:14:14 CET 2018] scale:10000000 groups:10 work_mem:64MB incremental:on max_workers:0 =====
SELECT * FROM s_1 ORDER BY a, b
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1588080.84..1588080.84 rows=1 width=20) (actual time=5874.527..5874.527 rows=0 loops=1)
   ->  Incremental Sort  (cost=119371.51..1488081.45 rows=9999939 width=20) (actual time=202.842..5653.224 rows=10000000 loops=1)
         Sort Key: s_1.a, s_1.b
         Presorted Key: s_1.a
         Sort Method: external merge  Disk: 29408kB
         Sort Groups: 11
         ->  Index Scan using s_1_a_idx on s_1  (cost=0.43..323385.52 rows=9999939 width=20) (actual time=0.051..1494.105 rows=10000000 loops=1)
 Planning time: 0.269 ms
 Execution time: 5877.367 ms
(9 rows)

===== 37 [Wed Mar  7 10:15:51 CET 2018] scale:10000000 groups:10 work_mem:64MB incremental:off max_workers:0 =====
SELECT * FROM s_1 ORDER BY a, b
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1656439.93..1656439.93 rows=1 width=20) (actual time=4741.716..4741.716 rows=0 loops=1)
   ->  Sort  (cost=1531440.69..1556440.54 rows=9999939 width=20) (actual time=3522.156..4519.278 rows=10000000 loops=1)
         Sort Key: s_1.a, s_1.b
         Sort Method: external merge  Disk: 293648kB
         ->  Seq Scan on s_1  (cost=0.00..163694.39 rows=9999939 width=20) (actual time=0.021..650.322 rows=10000000 loops=1)
 Planning time: 0.249 ms
 Execution time: 4777.088 ms
(7 rows)

In this case optimizer have decided that "Index Scan + Incremental Sort" would be 
cheaper than "Seq Scan + Sort".  But it appears that the amount of time we loose by
selecting Index Scan over Seq Scan is bigger than amount of time we win by
selecting Incremental Sort over Sort.  I would note that regular Sort consumes
about 10X more disk space.  I bet that all this space has fit to OS cache of test
machine.  But optimizer did expect actual IO to take place in this case.  This
has lead actual time to be inadequate the costing.

Case 2: Underlying node is not parallelyzed

===== 178 [Wed Mar  7 11:18:53 CET 2018] scale:10000000 groups:100 work_mem:8MB incremental:on max_workers:2 =====
SELECT * FROM s_2 ORDER BY a, b, c
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1179047.88..1179047.88 rows=1 width=20) (actual time=4819.999..4819.999 rows=0 loops=1)
   ->  Incremental Sort  (cost=89.04..1079047.34 rows=10000054 width=20) (actual time=0.203..4603.197 rows=10000000 loops=1)
         Sort Key: s_2.a, s_2.b, s_2.c
         Presorted Key: s_2.a, s_2.b
         Sort Method: quicksort  Memory: 135kB
         Sort Groups: 10201
         ->  Index Scan using s_2_a_b_idx on s_2  (cost=0.43..406985.62 rows=10000054 width=20) (actual time=0.052..1461.177 rows=10000000 loops=1)
 Planning time: 0.313 ms
 Execution time: 4820.037 ms
(9 rows)

===== 182 [Wed Mar  7 11:20:11 CET 2018] scale:10000000 groups:100 work_mem:8MB incremental:off max_workers:2 =====
SELECT * FROM s_2 ORDER BY a, b, c
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1705580.76..1705580.76 rows=1 width=20) (actual time=3985.818..3985.818 rows=0 loops=1)
   ->  Gather Merge  (cost=649951.66..1622246.98 rows=8333378 width=20) (actual time=1782.354..3750.868 rows=10000000 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=648951.64..659368.36 rows=4166689 width=20) (actual time=1778.362..2091.253 rows=3333333 loops=3)
               Sort Key: s_2.a, s_2.b, s_2.c
               Sort Method: external merge  Disk: 99136kB
               Worker 0:  Sort Method: external merge  Disk: 96984kB
               Worker 1:  Sort Method: external merge  Disk: 97496kB
               ->  Parallel Seq Scan on s_2  (cost=0.00..105361.89 rows=4166689 width=20) (actual time=0.022..233.640 rows=3333333 loops=3)
 Planning time: 0.265 ms
 Execution time: 4007.591 ms
(12 rows)

The situation is similar to case #1 except that in the pair "Seq Scan + Sort" Sort also
gets paralellyzed.  In the same way as in previous case, disk writes/reads during
external sort are overestimated, because they actually use OS cache.
I would also say that it's not necessary wrong decision of optimizer, because
doing this work in single backend may consume less resources despite being
overall slower.  

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

Previous
From: legrand legrand
Date:
Subject: Re: [PROPOSAL] timestamp informations to pg_stat_statements
Next
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists