Re: enable_incremental_sort changes query behavior - Mailing list pgsql-hackers

From Robert Haas
Subject Re: enable_incremental_sort changes query behavior
Date
Msg-id CA+Tgmoa3oNvLBio62jSGXCRhOotsofiqn9guus3+NTQvz6qPWA@mail.gmail.com
Whole thread Raw
In response to Re: enable_incremental_sort changes query behavior  (James Coleman <jtc331@gmail.com>)
Responses Re: enable_incremental_sort changes query behavior
Re: enable_incremental_sort changes query behavior
List pgsql-hackers
On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331@gmail.com> wrote:
> The plan (obtained by replacing the volatile function with a stable one):
>
>  Unique
>    ->  Nested Loop
>          ->  Gather Merge
>                Workers Planned: 2
>                ->  Sort
>                      Sort Key: ref_0.i, (md5(ref_0.t))
>                      ->  Parallel Index Scan using ref_0_i_idx on ref_0
>          ->  Function Scan on generate_series ref_1
>
> Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.

As far as I remember this stuff, target lists for any nodes below the
top of the join tree were previously always just Var nodes. Any
expressions that needed to be computed were computed at the level of
the topmost join, before adding upper planner steps like Agg,
WindowAgg, Limit, etc. But, here you've got a complex expression --
md5(ref_0.t) -- begin computed somewhere in the middle of the plan
tree so that the sort can be performed at that point. However, that
seems likely to break a bunch of assumptions all over the planner,
because, unless a lot of work and surgery has been done since I looked
at this last, that md5(ref_0.t) value is not going to "bubble up" to
higher levels of the plan through the tlists of the individual nodes.
That poses a risk that it will be recomputed multiple times, which is
particularly bad for volatile functions because you might not always
get the same answer, but it seems like it might be bad even for
non-volatile functions, because it could be slow if the value is used
at multiple places in the query. It feels like the sort of thing that
might have broken some other assumptions, too, although I can't say
exactly what.

I wonder if whatever part of the incremental sort patch allowed
computation of tlist expressions below the top level of the join tree
ought to just be reverted outright, but that might be overkill. I
can't say that I really understand exactly what's going on here, but
it seems like a pretty significant behavior change to make as part of
another project, and I wonder if we're going to keep finding other
problems with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Partitionwise join fails under GEQO
Next
From: James Coleman
Date:
Subject: Re: enable_incremental_sort changes query behavior