Re: MergeAppend could consider sorting cheapest child path - Mailing list pgsql-hackers

From Andrei Lepikhov
Subject Re: MergeAppend could consider sorting cheapest child path
Date
Msg-id 776047ed-b6fc-4257-8c1d-f848bffb00b7@gmail.com
Whole thread Raw
In response to Re: MergeAppend could consider sorting cheapest child path  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
On 4/6/2025 00:41, Alexander Korotkov wrote:
> On Tue, Jun 3, 2025 at 5:35 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>> For me, it seems like a continuation of the 7d8ac98 discussion. We may
>> charge a small fee for MergeAppend to adjust the balance, of course.
>> However, I think this small change requires a series of benchmarks to
>> determine how it affects the overall cost balance. Without examples it
>> is hard to say how important this issue is and its worthiness to
>> commence such work.
> 
> Yes, I think it's fair to charge the MergeAppend node.  We currently
> cost it similarly to Sort merge stage, but it's clearly more
> expensive.  It dealing on the executor level dealing with Slot's etc,
> while Sort node have a set of lower level optimizations.
After conducting additional research, I concluded that you are correct, 
and the current cost model doesn't allow the optimiser to detect the 
best option. A simple test with a full scan and sort of a partitioned 
table (see attachment) shows that the query plan prefers small sortings 
merged by the MergeAppend node. I have got the following results for 
different numbers of tuples to be sorted (in the range from 10 tuples to 
1E8):

EXPLAIN SELECT * FROM test ORDER BY y;

1E1: Sort  (cost=9.53..9.57 rows=17 width=8)
1E2: Sort  (cost=20.82..21.07 rows=100)
1E3: Merge Append  (cost=56.19..83.69 rows=1000)
1E4: Merge Append  (cost=612.74..887.74 rows=10000)
1E5: Merge Append  (cost=7754.19..10504.19 rows=100000)
1E6: Merge Append  (cost=94092.25..121592.25 rows=1000000)
1E7: Merge Append  (cost=1106931.22..1381931.22 rows=10000000)
1E8: Merge Append  (cost=14097413.40..16847413.40 rows=100000000)

That happens because both total costs lie within the fuzzy factor gap, 
and the optimiser chooses the path based on the startup cost, which is 
obviously better for the MergeAppend case.

At the same time, execution, involving a single Sort node, dominates the 
MergeAppend case:

1E3: MergeAppend: 1.927 ms, Sort: 0.720 ms
1E4: MergeAppend: 10.090 ms, Sort: 7.583 ms
1E5: MergeAppend: 118.885 ms, Sort: 88.492 ms
1E6: MergeAppend: 1372.717 ms, Sort: 1106.184 ms
1E7: MergeAppend: 15103.893 ms, Sort: 13415.806 ms
1E8: MergeAppend: 176553.133 ms, Sort: 149458.787 ms

Looking inside the code, I found the only difference we can employ to 
rationalise the cost model change: re-structuring of a heap. The 
siftDown routine employs two tuple comparisons to find the proper 
position for a tuple. So, we have objections to changing the constant in 
the cost model of the merge operation:

@@ -2448,7 +2448,7 @@ cost_merge_append(Path *path, PlannerInfo *root,
         logN = LOG2(N);

         /* Assumed cost per tuple comparison */
-       comparison_cost = 2.0 * cpu_operator_cost;
+       comparison_cost = 4.0 * cpu_operator_cost;

         /* Heap creation cost */
         startup_cost += comparison_cost * N * logN;

The exact change also needs to be made in the cost_gather_merge 
function, of course.
At this moment, I'm not sure that it should be multiplied by 2 - it is a 
subject for further discussion. However, it fixes the current issue and 
adds a little additional cost to the merge operation, which, as I see 
it, is quite limited.
Please see the new version of the patch attached.

-- 
regards, Andrei Lepikhov
Attachment

pgsql-hackers by date:

Previous
From: Dagfinn Ilmari Mannsåker
Date:
Subject: Re: Support getrandom() for pg_strong_random() source
Next
From: Dagfinn Ilmari Mannsåker
Date:
Subject: Re: Fix tab completion in v18 for ALTER DATABASE/USER/ROLE ... RESET