Re: [PATCH] Use optimized single-datum tuplesort in ExecSort - Mailing list pgsql-hackers

From James Coleman
Subject Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Date
Msg-id CAAaqYe9qVpWqJp3j9rOtbupJQM_m_PO1RqfrC-W5=+Kz7cqXqg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use optimized single-datum tuplesort in ExecSort  (Ronan Dunklau <ronan.dunklau@aiven.io>)
Responses Re: [PATCH] Use optimized single-datum tuplesort in ExecSort  (Ronan Dunklau <ronan.dunklau@aiven.io>)
List pgsql-hackers
On Tue, Jul 6, 2021 at 11:03 AM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
>
> Thank you for the review, I will address those shortly, but will answer some
> questions in the meantime.
>
> > > First, the changes are lacking any explanatory comments. Probably we
> > > should follow how nodeAgg does this and add both comments to the
> > > ExecSort function header as well as specific comments above the "if"
> > > around the new tuplesort_begin_datum explaining the specific
> > > conditions that are required for the optimization to be useful and
> > > safe.
>
> Done, since I lifted the restrictions following your questions, there isn't
> much left to comment. (see below)
>
> > >
> > > That leads to a question I had: I don't follow why bounded mode (when
> > > using byval) needs to be excluded. Comments should be added if there's
> > > a good reason (as noted above), but maybe it's a case we can handle
> > > safely?
>
> I had test failures when trying to move the Datum around when performing a
> bounded sort, but did not look into it at first.
>
> Now I've looked into it, and the switch to a heapsort when using bounded mode
> just unconditionnaly tried to free a tuple that was never there to begin with.
> So if the SortTuple does not contain an actual tuple, but only a single datum,
> do not do that.
>
> I've updated the patch to fix this and enable the optimization in the case of
> bounded sort.

Awesome.

> > > A second question: at first glance it's intuitively the case we might
> > > not be able to handle byref values. But nodeAgg doesn't seem to have
> > > that restriction. What's the difference here?
> >
> > I think tuplesort_begin_datum, doesn't have any such limitation, it
> > can handle any type of Datum so I think we don't need to consider the
> > only attbyval, we can consider any type of attribute for this
> > optimization.
>
> I've restricted the optimization to byval types because of the following
> comment in nodeAgg.c:
>
>         /*
>          * Note: if input type is pass-by-ref, the datums returned by the
> sort are
>          * freshly palloc'd in the per-query context, so we must be careful
> to
>          * pfree them when they are no longer needed.
>          */
>
> As I was not sure how to handle that, I prefered the safety of not enabling
> it. Since you both told me it should be safe, I've lifted that restriction
> too.

To be clear, I don't know for certain it's safe [without extra work],
but even if it involves some extra manual pfree'ing (a la nodeAgg)
it's probably worth it. Maybe someone else will weigh in on whether or
not anything special is required here to ensure we don't leak memory
(I haven't looked in detail yet).

> > A few small code observations:
> > - In my view the addition of unlikely() in ExecSort is unlikely to be
> > of benefit because it's a single call for the entire node's execution
> > (not in the tuple loop).
>
> Done.
>
> > - It seems clearer to change the "if (!node->is_single_val)" to flip
> > the true/false cases so we don't need the negation.
>
> Agreed, done.

Thanks

> > - I assume there are tests that likely already cover this case, but
> > it'd be worth verifying that.
>
> Yes many test cases cover that, but maybe it would be better to explictly
> check for it on some cases: do you think we could add a debug message that can
> be checked for ?

Mostly I think we should verify code coverage and _maybe_ add a
specific test or two that we know execises this path. I don't know
that the debug message needs to be matched in the test (probably more
pain than it's worth), but the debug message ("enabling datum sort
optimizaton" or similar) might be good anyway.

I wonder if we need to change costing of sorts for this case. I don't
like having to do so, but it's a significant change in speed, so
probably should impact what plan gets chosen. Hopefully others will
weigh on this also.

> > Finally, I believe the same optimization likely ought to be added to
> > nodeIncrementalSort. It's less likely the tests there are sufficient
> > for both this and the original case, but we'll see.
>
> I will look into it, but isn't incrementalsort used to sort tuples on several
> keys, when they are already sorted on the first ? In that case, I doubt we
> would ever have a single-valued tuple here, except if there is a projection to
> strip the tuple from extraneous attributes.

Yes and no. When incremental sort has to do a full sort there will
always be at least 2 attributes. But in prefix sort mode (see
prefixsort_state) only non-presorted columns are sorted (i.e., if
given a,b already sorted by a, then only b is sorted). So the
prefixsort_state could use this optimization.

James



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Enhanced error message to include hint messages for redundant options error
Next
From: Alvaro Herrera
Date:
Subject: Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options