Re: enable_incremental_sort changes query behavior - Mailing list pgsql-hackers

From Tom Lane
Subject Re: enable_incremental_sort changes query behavior
Date
Msg-id 295524.1606246314@sss.pgh.pa.us
Whole thread Raw
In response to Re: enable_incremental_sort changes query behavior  (James Coleman <jtc331@gmail.com>)
Responses Re: enable_incremental_sort changes query behavior  (James Coleman <jtc331@gmail.com>)
List pgsql-hackers
James Coleman <jtc331@gmail.com> writes:
> On Mon, Nov 23, 2020 at 2:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 1. James was wondering, far upthread, why we would do projections
>> pre-sort or post-sort.  I think the answers can be found by studying
>> planner.c's make_sort_input_target(), which separates out what we want
>> to do pre-sort and post-sort.

> Does it imply we need to intentionally avoid SRFs also?

It's sort of a wart that we allow SRFs in ORDER BY at all, but my
expectation is that make_sort_input_target would prevent lower levels
of the planner from needing to think about that.  We don't allow SRFs
in index expressions, nor in WHERE clauses (so they'll never come up
as mergejoin sort keys).  So really there's no way that scan/join
processing should need to consider such cases.  Such a sort would
only ever be implemented via a final Sort atop ProjectSet.

>> 2. Robert is correct that under normal circumstances, the targetlists of
>> both baserels and join rels contain only Vars.

> Is that primarily a project policy? Or a limitation of our current
> planner (just can't push things down/carry the results back up as much
> as we'd like)? Or something else? In particular, it seems plausible
> there are cases where pushing down the sort on a non-Var expression to
> the base rel could improve plans, so I'm wondering if there's a reason
> to intentionally avoid that in the long or short run (or both).

I think you've just rediscovered Joe Hellerstein's thesis topic [1].
We ripped out the remnants of that code ages ago (search very early
git states for "JMH" if you're interested), because the complexity
vs. benefit ratio seemed pretty bad.  Maybe there'll be a case for
putting it back some day, but I'm dubious.  Note that we have the
ability to push down sorts-on-expressions anyway; that's not constrained
by what is in the relation targetlists.

> Interesting: so merge joins are an example of us pushing down sorts,
> which I assume means (part of) the answer to my question on (2) is
> that there's nothing inherently wrong/broken with evaluating
> expressions lower down the tree as long as we're careful about what is
> safe/unsafe with respect to volatility and parallelism?

Right, I don't see any fundamental problem with that, we just have
to be careful about these constraints.

> I have wondered if we should strictly require the expression to be in
> the target list even if nonvolatile, but prepare_sort_from_pathkeys
> doesn't seem to think that's necessary, so I'm comfortable with that
> unless there's something you know we haven't considered.

No, prepare_sort_from_pathkeys is happy to build a sort expression if it
can't find it already computed in the input.  The secret here is that
we should never get to that code with a "dangerous" sort expression,
because we should never have made a Path that would request such a thing
in the first place.  It's fairly obvious to me how we deal with the
consideration for volatile sortkeys.  We cannot restrict parallel-unsafe
sortkeys quite the same way, because the restriction only applies to
parallel not non-parallel plans.  Maybe it's sufficient to mark Paths
as parallel unsafe if they sort by parallel-unsafe sortkeys?  Is there
anyplace that is Just Assuming that paths it builds will be
parallel-safe?

            regards, tom lane

[1] https://www.postgresql.org/message-id/28216.1023340706%40sss.pgh.pa.us



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] Custom compression methods
Next
From: Andrew Gierth
Date:
Subject: Re: mark/restore failures on unsorted merge joins