Re: First draft of the PG 15 release notes (sorting) - Mailing list pgsql-hackers

From David Rowley
Subject Re: First draft of the PG 15 release notes (sorting)
Date
Msg-id CAApHDvpG3jd6_-A58z4s=xBReehUSxDvbRHhiWBWYx8GX9gY8Q@mail.gmail.com
Whole thread Raw
In response to Re: First draft of the PG 15 release notes (sorting)  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Wed, 11 May 2022 at 14:38, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Wed, May 11, 2022 at 12:39:41PM +1200, David Rowley wrote:
> > I think the sort improvements done in v15 are worth a mention under
> > General Performance.  The commits for this were 91e9e89dc, 40af10b57
> > and 697492434.  I've been running a few benchmarks between v14 and v15
> > over the past few days and a fairly average case speedup is about 25%.
> > but there are cases where I've seen up to 400%.  I think the increase
> > is to an extent that we maybe should have considered making tweaks in
> > cost_tuplesort(). I saw some plans that ran in about 60% of the time
> > by disabling Hash Agg and allowing Sort / Group Agg to do the work.
>
> Is there any reason not to consider it now ?  Either for v15 or v15+1.

If the changes done had resulted in a change to the number of expected
operations as far as big-O notation goes, then I think we might be
able to do something.

However, nothing changed in the number of operations. We only sped up
the constant factors.  If it were possible to adjust those constant
factors based on some performance benchmarks results that were spat
out by some single machine somewhere, then maybe we could do some
tweaks.  The problem is that to know that we're actually making some
meaningful improvements to the costs, we'd want to get the opinion of
>1 machine and likely >1 CPU architecture.  That feels like something
that would be much better to do during a release cycle rather than at
this very late hour.  The majority of my benchmarks were on AMD zen2
hardware. That's likely not going to reflect well on what the average
hardware is that runs PostgreSQL.

Also, I've no idea at this stage what we'd even do to
cost_tuplesort().  The nruns calculation is a bit fuzzy and never
really took the power-of-2 wastage that 40af10b57 reduces.  Maybe
there's some argument for adjusting the 2.0 constant in
compute_cpu_sort_cost() based on what's done in 697492434. But there's
plenty of datatypes that don't use the new sort specialization
functions. Would we really want to add extra code to the planner to
get it to try and figure that out?

David



pgsql-hackers by date:

Previous
From: "Jonathan S. Katz"
Date:
Subject: Re: First draft of the PG 15 release notes
Next
From: Amit Kapila
Date:
Subject: Re: Column Filtering in Logical Replication