Re: MergeAppend could consider sorting cheapest child path - Mailing list pgsql-hackers

From Andrei Lepikhov
Subject Re: MergeAppend could consider sorting cheapest child path
Date
Msg-id 4ed36dd0-73f5-4c62-a4b8-3c655d256766@gmail.com
Whole thread Raw
In response to Re: MergeAppend could consider sorting cheapest child path  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
On 27/7/2025 00:51, Alexander Korotkov wrote:
> On Tue, Jul 22, 2025 at 2:13 PM Andrei Lepikhov <lepihov@gmail.com
> I've another idea.  cost_tuplesort() puts 2.0 under logarithm to prefer
> tuplesort over heapsort.  I think we can adjust cost_gather_merge() and
> cost_merge_append() to do the same.  0001 patch implements that.  I
> think the plan changes of 0001 might be reasonable since most cases deal
> with small rowsets.  One thing concerns me: 0002 still affects one of
> the postgres_fdw checks.  Could you, please, take a look?
Thanks for the idea!
I analysed your approach a little bit.
Initially, I ran the test script I had created previously [1] and
discovered that on a large scale (1e6 - 1e7 tuples), the plan still
defaults to MergeAppend, which deviates from the execution time (7190 ms
for Sort+Append and 8450 ms for MergeAppend+Sort).

Attempting to find out the reason, I combined all the costs into a
single formula for each strategy:

MergeAppend+Sort:
total_cost =CO*ntuples*(1+2*log(ntuples)) + Ccput * 0.5 * ntuples+
2*CO*N*log(N) + A
Sort+Append:
total_cost = CO*ntuples*(1+2*log(ntuples))+ Ccput * 0.5 * ntuples + A

Terms:
- A - sum of total costs of underlying subtrees
- CO - cpu_operator_cost
- Ccput - cpu_tuple_cost
- N - number of subpaths (streams)

Given the significant gap in total execution time between these
strategies, I believe it would be reasonable to introduce a coefficient
to the equation's 'ntuples' variable component that will keep the gap
between big quicksort and MergeAppend's heapsort out of the fuzzy factor
gap.

Discovering papers on the value of constant in quicksort [2] and
heapsort [3], I realised that there is a difference. The constant's
value varies in a wide range: 1.3-1.5 for quicksort and 2-3 for
heapsort. Considering that we should change the current cost model as
little as possible, not to break the balance, we may just increase the
constant value for the heap sort to maintain a bare minimum gap between
strategies out of the fuzzy factor. In this case, the merge append
constant should be around 3.8 - 4.0.

With this minor change, we see a shift in the regression tests. Most of
these changes were introduced by the new append strategy. Although I
haven't analysed these changes in depth yet, I believe they are all
related to the small data sets and should fade out on a larger scale.

See this minor correction in the attachment. postgres_fdw tests are
stable now.

[1]
https://github.com/danolivo/conf/blob/main/Scripts/sort-vs-mergeappend-3.sql
[2] https://en.wikipedia.org/wiki/Quicksort?utm_source=chatgpt.com
[2] https://arxiv.org/abs/1504.01459

--
regards, Andrei Lepikhov
Attachment

pgsql-hackers by date:

Previous
From: "Euler Taveira"
Date:
Subject: Re: log_min_messages per backend type
Next
From: Dagfinn Ilmari Mannsåker
Date:
Subject: Re: Fix tab completion in v18 for ALTER DATABASE/USER/ROLE ... RESET