Re: [PATCH] Use optimized single-datum tuplesort in ExecSort - Mailing list pgsql-hackers

From Ranier Vilela
Subject Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Date
Msg-id CAEudQAp7rvXHU=-OATqVe9U+tMXNhdvBzV50EMcKgMFW9agvCQ@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use optimized single-datum tuplesort in ExecSort  (James Coleman <jtc331@gmail.com>)
List pgsql-hackers
Em ter., 6 de jul. de 2021 às 10:19, James Coleman <jtc331@gmail.com> escreveu:
Adding David since this patch is likely a precondition for [1].

On Tue, Jul 6, 2021 at 2:15 AM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
>
> Hello,
>
> While testing the patch "Add proper planner support for ORDER BY / DISTINCT
> aggregates" [0] I discovered the performance penalty from adding a sort node
> essentially came from not using the single-datum tuplesort optimization in
> ExecSort (contrary to the sorting done in ExecAgg).
>
> I originally proposed this patch as a companion in the same thread [1], but
> following James suggestion I'm making a separate thread just for this as the
> optimization is worthwhile independently of David's patch: it looks like we
> can expect a 2x speedup on a "select a single ordered column" case.
>
> The patch aimed to be as simple as possible: we only turn this optimization on
> when the tuple being sorted has only one attribute, it is "byval" (so as not
> to incur copies which would be hard to track in the execution tree) and
> unbound (again, not having to deal with copying borrowed datum anywhere).

Thanks again for finding this and working up a patch.

I've taken a look, and while I haven't dug into testing it yet, I have
a few comments.

First, the changes are lacking any explanatory comments. Probably we
should follow how nodeAgg does this and add both comments to the
ExecSort function header as well as specific comments above the "if"
around the new tuplesort_begin_datum explaining the specific
conditions that are required for the optimization to be useful and
safe.

That leads to a question I had: I don't follow why bounded mode (when
using byval) needs to be excluded. Comments should be added if there's
a good reason (as noted above), but maybe it's a case we can handle
safely?

A second question: at first glance it's intuitively the case we might
not be able to handle byref values. But nodeAgg doesn't seem to have
that restriction. What's the difference here?

A few small code observations:
- In my view the addition of unlikely() in ExecSort is unlikely to be
of benefit because it's a single call for the entire node's execution
(not in the tuple loop).
No objection. And I agree that testing is complex and needs to remain as it is.

- It seems clearer to change the "if (!node->is_single_val)" to flip
the true/false cases so we don't need the negation.
I think yes, it can be better.

regards,
Ranier Vilela

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: .ready and .done files considered harmful
Next
From: Dilip Kumar
Date:
Subject: Re: [PATCH] Use optimized single-datum tuplesort in ExecSort