Re: [PATCH] Use optimized single-datum tuplesort in ExecSort - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Date
Msg-id CAFiTN-v=sBvLz6r_RWUC942vTMj4758hzUuR+WKGrvpmWniU8g@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use optimized single-datum tuplesort in ExecSort  (James Coleman <jtc331@gmail.com>)
Responses Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
List pgsql-hackers
On Tue, Jul 6, 2021 at 6:49 PM James Coleman <jtc331@gmail.com> wrote:
>
> Adding David since this patch is likely a precondition for [1].
>
> On Tue, Jul 6, 2021 at 2:15 AM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> >
> > Hello,
> >
> > While testing the patch "Add proper planner support for ORDER BY / DISTINCT
> > aggregates" [0] I discovered the performance penalty from adding a sort node
> > essentially came from not using the single-datum tuplesort optimization in
> > ExecSort (contrary to the sorting done in ExecAgg).
> >
> > I originally proposed this patch as a companion in the same thread [1], but
> > following James suggestion I'm making a separate thread just for this as the
> > optimization is worthwhile independently of David's patch: it looks like we
> > can expect a 2x speedup on a "select a single ordered column" case.
> >
> > The patch aimed to be as simple as possible: we only turn this optimization on
> > when the tuple being sorted has only one attribute, it is "byval" (so as not
> > to incur copies which would be hard to track in the execution tree) and
> > unbound (again, not having to deal with copying borrowed datum anywhere).
>
> Thanks again for finding this and working up a patch.
>
> I've taken a look, and while I haven't dug into testing it yet, I have
> a few comments.
>
> First, the changes are lacking any explanatory comments. Probably we
> should follow how nodeAgg does this and add both comments to the
> ExecSort function header as well as specific comments above the "if"
> around the new tuplesort_begin_datum explaining the specific
> conditions that are required for the optimization to be useful and
> safe.
>
> That leads to a question I had: I don't follow why bounded mode (when
> using byval) needs to be excluded. Comments should be added if there's
> a good reason (as noted above), but maybe it's a case we can handle
> safely?
>
> A second question: at first glance it's intuitively the case we might
> not be able to handle byref values. But nodeAgg doesn't seem to have
> that restriction. What's the difference here?
>

I think tuplesort_begin_datum, doesn't have any such limitation, it
can handle any type of Datum so I think we don't need to consider the
only attbyval, we can consider any type of attribute for this
optimization.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Next
From: Daniel Gustafsson
Date:
Subject: Re: proposal - log_full_scan