Re: enable_incremental_sort changes query behavior - Mailing list pgsql-hackers
From | James Coleman |
---|---|
Subject | Re: enable_incremental_sort changes query behavior |
Date | |
Msg-id | CAAaqYe_Oe=tRBhJKBbxz3bPygQ-5RpgcF8XO-UQzV1_Fu_w+9g@mail.gmail.com Whole thread Raw |
In response to | Re: enable_incremental_sort changes query behavior (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: enable_incremental_sort changes query behavior
|
List | pgsql-hackers |
On Wed, Oct 7, 2020 at 3:52 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331@gmail.com> wrote: > > The plan (obtained by replacing the volatile function with a stable one): > > > > Unique > > -> Nested Loop > > -> Gather Merge > > Workers Planned: 2 > > -> Sort > > Sort Key: ref_0.i, (md5(ref_0.t)) > > -> Parallel Index Scan using ref_0_i_idx on ref_0 > > -> Function Scan on generate_series ref_1 > > > > Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error. > > As far as I remember this stuff, target lists for any nodes below the > top of the join tree were previously always just Var nodes. Any > expressions that needed to be computed were computed at the level of > the topmost join, before adding upper planner steps like Agg, > WindowAgg, Limit, etc. But, here you've got a complex expression -- > md5(ref_0.t) -- begin computed somewhere in the middle of the plan > tree so that the sort can be performed at that point. However, that > seems likely to break a bunch of assumptions all over the planner, > because, unless a lot of work and surgery has been done since I looked > at this last, that md5(ref_0.t) value is not going to "bubble up" to > higher levels of the plan through the tlists of the individual nodes. > That poses a risk that it will be recomputed multiple times, which is > particularly bad for volatile functions because you might not always > get the same answer, but it seems like it might be bad even for > non-volatile functions, because it could be slow if the value is used > at multiple places in the query. It feels like the sort of thing that > might have broken some other assumptions, too, although I can't say > exactly what. The distinction between volatile and stable expressions here has been discussed at length in this thread; I think it'd be worth reading through the rest of the thread before offering sledgehammer ideas below like reverting the entire piece. In particular, most of the discussion in the thread has been about what is actually safe to reference at lower levels in the plan. It would be extremely easy tomake this only allow sorting by Vars at lower levels in the plan, though the current revision of the patch follows what prepare_sort_from_pathkeys allows and disallows. Obviously it'd be nice to be able to sort by non-Var expressions at lower levels too. But again, it'd be easy to change that if it's not safe, so it'd be helpful if you could point to areas that don't bubble non-Vars up so that we can actually comment about that in the source and discuss specifics. > I wonder if whatever part of the incremental sort patch allowed > computation of tlist expressions below the top level of the join tree > ought to just be reverted outright, but that might be overkill. I > can't say that I really understand exactly what's going on here, but > it seems like a pretty significant behavior change to make as part of > another project, and I wonder if we're going to keep finding other > problems with it. James
pgsql-hackers by date: