Re: Why don't we consider explicit Incremental Sort? - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Why don't we consider explicit Incremental Sort?
Date
Msg-id CAMbWs48Y_eTF8w+ypGJ=vyqkGjoqwNmgH63hyZx+mdHsheN0vQ@mail.gmail.com
Whole thread Raw
In response to Re: Why don't we consider explicit Incremental Sort?  (Tomas Vondra <tomas@vondra.me>)
Responses Re: Why don't we consider explicit Incremental Sort?
List pgsql-hackers
On Mon, Sep 9, 2024 at 6:22 PM Tomas Vondra <tomas@vondra.me> wrote:
> I have not thought about this particular case too much, but how likely
> is it that mergejoin will win for this plan in practice? If we consider
> a query that only needs a fraction of rows (say, thanks to LIMIT),
> aren't we likely to pick a nested loop (which can do incremental sort
> for the outer plan)? For joins that need to run to completion it might
> be a win, but there's also the higher risk of a poor costing.

I think one situation where mergejoin tends to outperform hashjoin and
nestloop is when ORDER BY clauses are present.  For example, for the
query below, mergejoin runs much faster than hashjoin and nestloop,
and mergejoin with incremental sort is even faster than mergejoin with
full sort.

create table t (a int, b int);
insert into t select random(1,100), random(1,100) from
generate_series(1,100000);

analyze t;

-- on patched
explain (analyze, costs off, timing off)
select * from (select * from t order by a) t1 join t t2 on t1.a = t2.a
and t1.b = t2.b order by t1.a, t1.b;
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Merge Join (actual rows=1100372 loops=1)
   Merge Cond: ((t.a = t2.a) AND (t.b = t2.b))
   ->  Incremental Sort (actual rows=100000 loops=1)
         Sort Key: t.a, t.b
         Presorted Key: t.a
         Full-sort Groups: 100  Sort Method: quicksort  Average
Memory: 26kB  Peak Memory: 26kB
         Pre-sorted Groups: 100  Sort Method: quicksort  Average
Memory: 74kB  Peak Memory: 74kB
         ->  Sort (actual rows=100000 loops=1)
               Sort Key: t.a
               Sort Method: external merge  Disk: 1768kB
               ->  Seq Scan on t (actual rows=100000 loops=1)
   ->  Sort (actual rows=1100367 loops=1)
         Sort Key: t2.a, t2.b
         Sort Method: external sort  Disk: 2160kB
         ->  Seq Scan on t t2 (actual rows=100000 loops=1)
 Planning Time: 0.912 ms
 Execution Time: 854.502 ms
(17 rows)

-- disable incremental sort
set enable_incremental_sort to off;

explain (analyze, costs off, timing off)
select * from (select * from t order by a) t1 join t t2 on t1.a = t2.a
and t1.b = t2.b order by t1.a, t1.b;
                          QUERY PLAN
--------------------------------------------------------------
 Merge Join (actual rows=1100372 loops=1)
   Merge Cond: ((t.a = t2.a) AND (t.b = t2.b))
   ->  Sort (actual rows=100000 loops=1)
         Sort Key: t.a, t.b
         Sort Method: external merge  Disk: 1768kB
         ->  Sort (actual rows=100000 loops=1)
               Sort Key: t.a
               Sort Method: external merge  Disk: 1768kB
               ->  Seq Scan on t (actual rows=100000 loops=1)
   ->  Sort (actual rows=1100367 loops=1)
         Sort Key: t2.a, t2.b
         Sort Method: external sort  Disk: 2160kB
         ->  Seq Scan on t t2 (actual rows=100000 loops=1)
 Planning Time: 1.451 ms
 Execution Time: 958.607 ms
(15 rows)


-- with hashjoin
set enable_mergejoin to off;

explain (analyze, costs off, timing off)
select * from (select * from t order by a) t1 join t t2 on t1.a = t2.a
and t1.b = t2.b order by t1.a, t1.b;
                             QUERY PLAN
--------------------------------------------------------------------
 Sort (actual rows=1100372 loops=1)
   Sort Key: t.a, t.b
   Sort Method: external merge  Disk: 28000kB
   ->  Hash Join (actual rows=1100372 loops=1)
         Hash Cond: ((t2.a = t.a) AND (t2.b = t.b))
         ->  Seq Scan on t t2 (actual rows=100000 loops=1)
         ->  Hash (actual rows=100000 loops=1)
               Buckets: 131072  Batches: 1  Memory Usage: 4931kB
               ->  Sort (actual rows=100000 loops=1)
                     Sort Key: t.a
                     Sort Method: external merge  Disk: 1768kB
                     ->  Seq Scan on t (actual rows=100000 loops=1)
 Planning Time: 1.998 ms
 Execution Time: 2469.954 ms
(14 rows)

-- with nestloop, my small machine seems runs forever.

Thanks
Richard



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Using per-transaction memory contexts for storing decoded tuples
Next
From: Jim Jones
Date:
Subject: Re: Psql meta-command conninfo+