Re: Why don't we consider explicit Incremental Sort? - Mailing list pgsql-hackers
From | Richard Guo |
---|---|
Subject | Re: Why don't we consider explicit Incremental Sort? |
Date | |
Msg-id | CAMbWs48Y_eTF8w+ypGJ=vyqkGjoqwNmgH63hyZx+mdHsheN0vQ@mail.gmail.com Whole thread Raw |
In response to | Re: Why don't we consider explicit Incremental Sort? (Tomas Vondra <tomas@vondra.me>) |
Responses |
Re: Why don't we consider explicit Incremental Sort?
|
List | pgsql-hackers |
On Mon, Sep 9, 2024 at 6:22 PM Tomas Vondra <tomas@vondra.me> wrote: > I have not thought about this particular case too much, but how likely > is it that mergejoin will win for this plan in practice? If we consider > a query that only needs a fraction of rows (say, thanks to LIMIT), > aren't we likely to pick a nested loop (which can do incremental sort > for the outer plan)? For joins that need to run to completion it might > be a win, but there's also the higher risk of a poor costing. I think one situation where mergejoin tends to outperform hashjoin and nestloop is when ORDER BY clauses are present. For example, for the query below, mergejoin runs much faster than hashjoin and nestloop, and mergejoin with incremental sort is even faster than mergejoin with full sort. create table t (a int, b int); insert into t select random(1,100), random(1,100) from generate_series(1,100000); analyze t; -- on patched explain (analyze, costs off, timing off) select * from (select * from t order by a) t1 join t t2 on t1.a = t2.a and t1.b = t2.b order by t1.a, t1.b; QUERY PLAN ------------------------------------------------------------------------------------------------- Merge Join (actual rows=1100372 loops=1) Merge Cond: ((t.a = t2.a) AND (t.b = t2.b)) -> Incremental Sort (actual rows=100000 loops=1) Sort Key: t.a, t.b Presorted Key: t.a Full-sort Groups: 100 Sort Method: quicksort Average Memory: 26kB Peak Memory: 26kB Pre-sorted Groups: 100 Sort Method: quicksort Average Memory: 74kB Peak Memory: 74kB -> Sort (actual rows=100000 loops=1) Sort Key: t.a Sort Method: external merge Disk: 1768kB -> Seq Scan on t (actual rows=100000 loops=1) -> Sort (actual rows=1100367 loops=1) Sort Key: t2.a, t2.b Sort Method: external sort Disk: 2160kB -> Seq Scan on t t2 (actual rows=100000 loops=1) Planning Time: 0.912 ms Execution Time: 854.502 ms (17 rows) -- disable incremental sort set enable_incremental_sort to off; explain (analyze, costs off, timing off) select * from (select * from t order by a) t1 join t t2 on t1.a = t2.a and t1.b = t2.b order by t1.a, t1.b; QUERY PLAN -------------------------------------------------------------- Merge Join (actual rows=1100372 loops=1) Merge Cond: ((t.a = t2.a) AND (t.b = t2.b)) -> Sort (actual rows=100000 loops=1) Sort Key: t.a, t.b Sort Method: external merge Disk: 1768kB -> Sort (actual rows=100000 loops=1) Sort Key: t.a Sort Method: external merge Disk: 1768kB -> Seq Scan on t (actual rows=100000 loops=1) -> Sort (actual rows=1100367 loops=1) Sort Key: t2.a, t2.b Sort Method: external sort Disk: 2160kB -> Seq Scan on t t2 (actual rows=100000 loops=1) Planning Time: 1.451 ms Execution Time: 958.607 ms (15 rows) -- with hashjoin set enable_mergejoin to off; explain (analyze, costs off, timing off) select * from (select * from t order by a) t1 join t t2 on t1.a = t2.a and t1.b = t2.b order by t1.a, t1.b; QUERY PLAN -------------------------------------------------------------------- Sort (actual rows=1100372 loops=1) Sort Key: t.a, t.b Sort Method: external merge Disk: 28000kB -> Hash Join (actual rows=1100372 loops=1) Hash Cond: ((t2.a = t.a) AND (t2.b = t.b)) -> Seq Scan on t t2 (actual rows=100000 loops=1) -> Hash (actual rows=100000 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 4931kB -> Sort (actual rows=100000 loops=1) Sort Key: t.a Sort Method: external merge Disk: 1768kB -> Seq Scan on t (actual rows=100000 loops=1) Planning Time: 1.998 ms Execution Time: 2469.954 ms (14 rows) -- with nestloop, my small machine seems runs forever. Thanks Richard
pgsql-hackers by date: