Home > mailing lists

Re: Sort operation displays more tuples than it contains its subnode - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: Sort operation displays more tuples than it contains its subnode
Date	May 22, 2024 21:08:35
Msg-id	CAApHDvq72iFMdMbZd+5ac_2bP1=80+g5m0xkAp+zGn6iCEZt-w@mail.gmail.com Whole thread
In response to	Sort operation displays more tuples than it contains its subnode ("a.rybakina" <a.rybakina@postgrespro.ru>)
List	pgsql-hackers

Tree view

On Thu, 23 May 2024 at 08:48, a.rybakina <a.rybakina@postgrespro.ru> wrote:
>                ->  Sort  (cost=613.48..632.86 rows=7754 width=8) (actual time=7.612..21.214 rows=77531 loops=1)
>                      Sort Key: broke_down_course.sno, broke_down_course.cno
>                      Sort Method: quicksort  Memory: 374kB
>                      ->  Seq Scan on broke_down_course  (cost=0.00..112.54 rows=7754 width=8) (actual
time=0.016..1.366rows=7754 loops=1)
 


> Maybe, my reproduction looks questionable, sorry for that, but I seriously don't understand why we have so many
tupleshere in Sort node.
 

This is because of the "mark and restore" that occurs because of the
Merge Join. This must happen for joins because every tuple matching
the join condition must join to every other tuple that matches the
join condition. That means, if you have 3 tuples with the same key on
either side, you get 9 rows, not 3 rows.

Here's a simple example of the behaviour you describe shrunk down so
that it's more easily understandable:

create table t(a int);
insert into t values(1),(1),(1);
set enable_hashjoin=0;
set enable_nestloop=0;
explain (analyze, costs off) select * from t inner join t t1 on t.a=t1.a;
                               QUERY PLAN
------------------------------------------------------------------------
 Merge Join (actual time=0.036..0.038 rows=9 loops=1)
   Merge Cond: (t.a = t1.a)
   ->  Sort (actual time=0.025..0.025 rows=3 loops=1)
         Sort Key: t.a
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on t (actual time=0.017..0.018 rows=3 loops=1)
   ->  Sort (actual time=0.007..0.007 rows=7 loops=1)
         Sort Key: t1.a
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on t t1 (actual time=0.003..0.003 rows=3 loops=1)

Note the sort has rows=7 and the Seq Scan on t1 rows=3 and an output of 9 rows.

If you look at the code in [1], you can see the restoreMark() calls to
achieve this.

David

[1] https://en.wikipedia.org/wiki/Sort-merge_join

pgsql-hackers by date:

From: "a.rybakina"
Date: 22 May 2024, 20:31:21
Subject: Sort operation displays more tuples than it contains its subnode

From: Tom Lane
Date: 22 May 2024, 21:17:48
Subject: Re: Sort operation displays more tuples than it contains its subnode

Re: Sort operation displays more tuples than it contains its subnode - Mailing list pgsql-hackers

Previous

Next