Re: Support loser tree for k-way merge - Mailing list pgsql-hackers

From John Naylor
Subject Re: Support loser tree for k-way merge
Date
Msg-id CANWCAZYKgc4xFbBXtePw_XY+a+Ao_-CxZ+Z8YWSrK2B1HqtdWg@mail.gmail.com
Whole thread Raw
In response to Re: Support loser tree for k-way merge  (Sami Imseih <samimseih@gmail.com>)
Responses Re: Support loser tree for k-way merge
List pgsql-hackers
On Thu, Dec 4, 2025 at 1:14 AM Sami Imseih <samimseih@gmail.com> wrote:
> Can we drive the decision for what to do based on optimizer
> stats, i.e. n_distinct and row counts? Not sure what the calculation would
> be specifically, but something else to consider.

It's happened multiple times before that someone proposes a change
that makes sorting faster on some inputs, but turns out to regress on
low cardinality (I've done it myself). It seems to be pretty hard not
to regress that case. Occasionally the author proposes to take
optimizer stats into account, and that was rejected because
cardinality stats are often wildly wrong.

Further, underestimation is far more common than overestimation, in
which case IIUC the planner would just continue to choose the existing
heap method.

> We can still provide the GUC to  override the optimizer decisions,
> but at least the optimizer, given up-to-date stats, may get it right most
> of the time.

I don't have much faith that people will properly set a GUC whose
effects depends on the input characteristics and memory settings.

The new method might be a better overall trade-off, but we'd need some
more comprehensive measurements to know what we're dealing with.

--
John Naylor
Amazon Web Services



pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Skipping schema changes in publication
Next
From: Chao Li
Date:
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)