Re: Strange sort node/explain result - Mailing list pgsql-bugs

From David Rowley
Subject Re: Strange sort node/explain result
Date
Msg-id CAApHDvpg-kS3QJCgZ63O7PrHE=x4nyb_zT5j-tay=L4mg5Lg7g@mail.gmail.com
Whole thread Raw
In response to Strange sort node/explain result  ("Gunnar \"Nick\" Bluth" <gunnar.bluth@pro-open.de>)
Responses Re: Strange sort node/explain result
List pgsql-bugs
On Tue, 1 Nov 2022 at 03:20, Gunnar "Nick" Bluth
<gunnar.bluth@pro-open.de> wrote:
> What puzzles us is the part where the CTE "oneyear" somehow explodes
> into a sort node of almost 10 mio (but not the same amount as the index
> scan emits!) rows, taking ~ 0.4 seconds but only using 4x-5x kB of memory:
>
>    ->  Sort  (cost=69.83..72.33 rows=1000 width=4) (actual
> time=0.418..448.397 rows=9855001 loops=1)

I think you're interpreting the EXPLAIN output wrongly.  It's not that
the Sort node emits 9855001 rows, it's that 9855001 are read from the
Sort node.

The reason more rows are read from it than are produced is because
Merge Join must perform mark and restore to "rewind" the inner side of
the scan back for the subsequent outer tuple which has the same value.

e.g if you're joining:

outer:
1
1

inner:
1
1

We'll get 4 rows (total).  After the first outer row has found all its
join partners, the same must be done with the 2nd outer row, however,
we're already read beyond the final 1 in the inner side, so we must
rewind back to the position of the first inner 1 and then perform the
join to the 2nd outer row. Thus producing the 3rd and 4th outer rows.
The inner side will have been read 4 times despite there only being 2
rows in it.

There are only 366 rows for the Sort node to sort. 42kb seems like
reasonable memory use for that.

David



pgsql-bugs by date:

Previous
From: "Gunnar \"Nick\" Bluth"
Date:
Subject: Strange sort node/explain result
Next
From: David Rowley
Date:
Subject: Re: Memory leak on subquery as scalar operand