Home > mailing lists

Re: [HACKERS] [PATCH] Incremental sort - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: [HACKERS] [PATCH] Incremental sort
Date	March 20, 2017 20:19:42
Msg-id	2c59b009-61d3-9350-04ee-4b701eb93101@iki.fi Whole thread Raw
In response to	Re: [HACKERS] [PATCH] Incremental sort (was: PoC: Partial sort) (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses	Re: [PATCH] Incremental sort (David Steele <david@pgmasters.net>) Re: [PATCH] Incremental sort (Alexander Korotkov <a.korotkov@postgrespro.ru>)
List	pgsql-hackers

Tree view

On 03/20/2017 11:33 AM, Alexander Korotkov wrote:
> Please, find rebased patch in the attachment.

I had a quick look at this.

* I'd love to have an explanation of what an Incremental Sort is, in the 
file header comment for nodeIncrementalSort.c.

* I didn't understand the maxMem stuff in tuplesort.c. The comments 
there use the phrase "on-disk memory", which seems like an oxymoron. 
Also, "maximum status" seems weird, as it assumes that there's a natural 
order to the states.

* In the below example, the incremental sort is significantly slower 
than the Seq Scan + Sort you get otherwise:

create table foo (a int4, b int4, c int4);
insert into sorttest select g, g, g from generate_series(1, 1000000) g;
vacuum foo;
create index i_sorttest on sorttest (a, b, c);
set work_mem='100MB';


postgres=# explain select count(*) from (select * from sorttest order by 
a, c) as t;                                              QUERY PLAN 

------------------------------------------------------------------------------------------------------- Aggregate
(cost=138655.68..138655.69rows=1 width=8)   ->  Incremental Sort  (cost=610.99..124870.38 rows=1102824 width=12)
Sort Key: sorttest.a, sorttest.c         Presorted Key: sorttest.a         ->  Index Only Scan using i_sorttest on
sorttest
 
(cost=0.43..53578.79 rows=1102824 width=12)
(5 rows)

Time: 0.409 ms
postgres=# select count(*) from (select * from sorttest order by a, c) as t;  count
--------- 1000000
(1 row)

Time: 387.091 ms


postgres=# explain select count(*) from (select * from sorttest order by 
a, c) as t;                                  QUERY PLAN 

------------------------------------------------------------------------------- Aggregate  (cost=130063.84..130063.85
rows=1width=8)   ->  Sort  (cost=115063.84..117563.84 rows=1000000 width=12)         Sort Key: sorttest.a, sorttest.c
     ->  Seq Scan on sorttest  (cost=0.00..15406.00 rows=1000000 
 
width=12)
(4 rows)

Time: 0.345 ms
postgres=# select count(*) from (select * from sorttest order by a, c) as t;  count
--------- 1000000
(1 row)

Time: 231.668 ms

According to 'perf', 85% of the CPU time is spent in ExecCopySlot(). To 
alleviate that, it might be worthwhile to add a special case for when 
the group contains exactly one group, and not put the tuple to the 
tuplesort in that case. Or if we cannot ensure that the Incremental Sort 
is actually faster, the cost model should probably be smarter, to avoid 
picking an incremental sort when it's not a win.

- Heikki

pgsql-hackers by date:

From: Alvaro Herrera
Date: 20 March 2017, 20:18:07
Subject: Re: [HACKERS] Inadequate traces in TAP tests

From: Robert Haas
Date: 20 March 2017, 20:24:22
Subject: Re: [HACKERS] [COMMITTERS] pgsql: Improve pg_dump regression tests and code coverage

Re: [HACKERS] [PATCH] Incremental sort - Mailing list pgsql-hackers

Previous

Next