On 4/6/2025 00:41, Alexander Korotkov wrote:
> On Tue, Jun 3, 2025 at 5:35 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>> For me, it seems like a continuation of the 7d8ac98 discussion. We may
>> charge a small fee for MergeAppend to adjust the balance, of course.
>> However, I think this small change requires a series of benchmarks to
>> determine how it affects the overall cost balance. Without examples it
>> is hard to say how important this issue is and its worthiness to
>> commence such work.
>
> Yes, I think it's fair to charge the MergeAppend node. We currently
> cost it similarly to Sort merge stage, but it's clearly more
> expensive. It dealing on the executor level dealing with Slot's etc,
> while Sort node have a set of lower level optimizations.
After conducting additional research, I concluded that you are correct,
and the current cost model doesn't allow the optimiser to detect the
best option. A simple test with a full scan and sort of a partitioned
table (see attachment) shows that the query plan prefers small sortings
merged by the MergeAppend node. I have got the following results for
different numbers of tuples to be sorted (in the range from 10 tuples to
1E8):
EXPLAIN SELECT * FROM test ORDER BY y;
1E1: Sort (cost=9.53..9.57 rows=17 width=8)
1E2: Sort (cost=20.82..21.07 rows=100)
1E3: Merge Append (cost=56.19..83.69 rows=1000)
1E4: Merge Append (cost=612.74..887.74 rows=10000)
1E5: Merge Append (cost=7754.19..10504.19 rows=100000)
1E6: Merge Append (cost=94092.25..121592.25 rows=1000000)
1E7: Merge Append (cost=1106931.22..1381931.22 rows=10000000)
1E8: Merge Append (cost=14097413.40..16847413.40 rows=100000000)
That happens because both total costs lie within the fuzzy factor gap,
and the optimiser chooses the path based on the startup cost, which is
obviously better for the MergeAppend case.
At the same time, execution, involving a single Sort node, dominates the
MergeAppend case:
1E3: MergeAppend: 1.927 ms, Sort: 0.720 ms
1E4: MergeAppend: 10.090 ms, Sort: 7.583 ms
1E5: MergeAppend: 118.885 ms, Sort: 88.492 ms
1E6: MergeAppend: 1372.717 ms, Sort: 1106.184 ms
1E7: MergeAppend: 15103.893 ms, Sort: 13415.806 ms
1E8: MergeAppend: 176553.133 ms, Sort: 149458.787 ms
Looking inside the code, I found the only difference we can employ to
rationalise the cost model change: re-structuring of a heap. The
siftDown routine employs two tuple comparisons to find the proper
position for a tuple. So, we have objections to changing the constant in
the cost model of the merge operation:
@@ -2448,7 +2448,7 @@ cost_merge_append(Path *path, PlannerInfo *root,
logN = LOG2(N);
/* Assumed cost per tuple comparison */
- comparison_cost = 2.0 * cpu_operator_cost;
+ comparison_cost = 4.0 * cpu_operator_cost;
/* Heap creation cost */
startup_cost += comparison_cost * N * logN;
The exact change also needs to be made in the cost_gather_merge
function, of course.
At this moment, I'm not sure that it should be multiplied by 2 - it is a
subject for further discussion. However, it fixes the current issue and
adds a little additional cost to the merge operation, which, as I see
it, is quite limited.
Please see the new version of the patch attached.
--
regards, Andrei Lepikhov