Home > mailing lists

Re: MergeAppend could consider sorting cheapest child path - Mailing list pgsql-hackers

From	Andrei Lepikhov
Subject	Re: MergeAppend could consider sorting cheapest child path
Date	October 15, 2025 12:26:45
Msg-id	cbaa0342-8834-4333-a40d-bcf60f336a21@gmail.com Whole thread Raw
In response to	Re: MergeAppend could consider sorting cheapest child path (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: MergeAppend could consider sorting cheapest child path
List	pgsql-hackers

Tree view

On 15/10/2025 09:59, David Rowley wrote:
> On Wed, 15 Oct 2025 at 19:45, Richard Guo <guofenglinux@gmail.com> wrote:
>> I also noticed the hack you added to avoid using MergeAppend+Sort when
>> none of the chosen subpaths are ordered.  It seems to me that this
>> contradicts the idea of this patch.  If MergeAppend+Sort is indeed a
>> better plan, why wouldn't it apply in cases where no chosen subpaths
>> are ordered?
> 
> FWIW, I've not really followed this closely, but from the parts I have
> read it seems the patch could cause a Sort -> unsorted path to be used
> over a path that's already correctly sorted. This reminds me of a
> patch I proposed in [1] and then subsequently decided it was a bad
> idea in [2] because of concerns of having too many Sorts in a single
> plan. Sort only calls tuplesort_end() at executor shutdown, so that
> means possibly using up to work_mem per sort node. If you have 1000x
> Sort nodes, then that's up to 1000x work_mem. Since the planner
> doesn't have any abilities to consider the overall memory consumption,
> I thought it was a bad idea due to increased OOM risk. If I'm not
> mistaken it looks like this could suffer from the same problem.
Thanks for your feedback!
This patch originated from the practice of how table partitioning can 
severely impact query execution. It is a rare case in our experience 
when all partitions have symmetrical indexes, especially those located 
remotely. People adopt a set of indexes according to the current load 
profile on hot partitions. In fact, it is a typical case when a 
timestamp orders partitions, and most of them are rarely updated.

So, the goal is to use MergeAppend when only a few partitions lack a 
proper index.
The concern about memory consumption makes sense, of course. However, we 
choose to sort based on cost estimations that usually work when the 
optimiser decides between fetching many tuples during an Index Scan, 
compared to only a few tuples to fetch with subsequent sorting. 
Additionally, scan estimation typically yields good predictions 
(compared to JOIN), and I personally estimate the OOM risk to be low.

Additionally, this patch revealed an issue with the cost model: there is 
no significant difference between a single massive Sort and multiple 
sorts followed by MergeAppend. Our experiments show that it is incorrect 
(one Sort operator demonstrates more efficacy) and may be corrected.

-- 
regards, Andrei Lepikhov,
pgEdge

pgsql-hackers by date:

From: Daniel Gustafsson
Date: 15 October 2025, 12:17:14
Subject: Re: [PATCH] Handle out-of-range timestamps in timestamptz_to_str()

From: shveta malik
Date: 15 October 2025, 12:29:24
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance

Re: MergeAppend could consider sorting cheapest child path - Mailing list pgsql-hackers

Previous

Next