Re: enable_incremental_sort changes query behavior - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: enable_incremental_sort changes query behavior |
Date | |
Msg-id | 20201003214440.fe4674dce5q7z4bg@development Whole thread Raw |
In response to | Re: enable_incremental_sort changes query behavior (James Coleman <jtc331@gmail.com>) |
Responses |
Re: enable_incremental_sort changes query behavior
|
List | pgsql-hackers |
On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote: >On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote: >> >> On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote: >> > >> > On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra >> > <tomas.vondra@2ndquadrant.com> wrote: >> > > >> > > On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote: >> > > >On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra >> > > ><tomas.vondra@2ndquadrant.com> wrote: >> > > >> >> > > >> ... >> > > >> >> > > >> More importanly, it does not actually fix the issue - it does fix that >> > > >> particular query, but just replacing the DISTINCT with either ORDER BY >> > > >> or GROUP BY make it fail again :-( >> > > >> >> > > >> Attached is a simple script I used, demonstrating these issues for the >> > > >> three cases that expect to have ressortgroupref != 0 (per the comment >> > > >> before TargetEntry in plannodes.h). >> > > > >> > > >So does checking for volatile expressions (if you happened to test >> > > >that) solve all the cases? If you haven't tested that yet, I can try >> > > >to do that this evening. >> > > > >> > > >> > > Yes, it does fix all the three queries in the SQL script. >> > > >> > > The question however is whether this is the root issue, and whether it's >> > > the right way to fix it. For example - volatility is not the only reason >> > > that may block qual pushdown. If you look at qual_is_pushdown_safe, it >> > > also blocks pushdown of leaky functions in security_barrier views. So I >> > > wonder if that could cause failures too, somehow. But I haven't managed >> > > to create such example. >> > >> > I was about to say that the issue here is slightly different from qual >> > etc. pushdown, since we're not concerned about quals here, and so I >> > wonder where we determine what target list entries to put in a given >> > scan path, but then I realized that implies (maybe!) a simpler >> > solution. Instead of duplicating checks on target list entries would >> > be safe, why not check directly in get_useful_pathkeys_for_relation() >> > whether or not the pathkey has a target list entry? >> > >> > I haven't been able to try that out yet, and so maybe I'm missing >> > something, but I need to step out for a bit, so I'll have to look at >> > it later. >> >> I've started poking at this, but haven't yet found a workable >> solution. See the attached patch which "solves" the problem by >> breaking putting the sort under the gather merge, but it at least >> demonstrates conceptually what I think we're interested in doing. >> >> The issue currently is that the comparison of expressions fails -- in >> our query, the first column selected shows up as a Var (with the same >> contents) but is a different pointer in the em_expr and the reltarget >> exprs list. I don't yet see a good way to get the equivalence class >> for a PathTarget entry. > >Hmm, I think I was looking to do is the attached patch. I didn't >realize until I did a lot of reading through source that we have an >equal() function that can compare exprs. That (plus the realization in >[1] the originally reported backtrace wasn't where the error actually >came from) convinced me that what we need is to confirm not only that >the all of the vars in the ec member come from the relids in the rel >but also that the expr is actually represented in the target of the >rel. > >With the gucs changed as I mentioned earlier both of the plans (with >and without a volatile call in the 2nd select entry) now look like: > > Unique > -> Gather Merge > Workers Planned: 2 > -> Sort > Sort Key: ref_0.i, (md5(ref_0.t)) > -> Nested Loop > -> Parallel Index Scan using ref_0_i_idx on ref_0 > -> Function Scan on generate_series ref_1 > >Without the gucs changed the minimal repro case now doesn't error, but >results in this plan: > > HashAggregate > Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t >ELSE ref_0.t END > -> Nested Loop > -> Seq Scan on ref_0 > -> Function Scan on generate_series ref_1 > >Similarly in your six queries I now only see parallel query showing up >in the last one. > OK, that seems reasonable I think. >I created an entirely new function because adding the target expr >lookup to the existing find_em_expr_for_rel() function broke a bunch >of postgres_fdw tests. That maybe raises questions about whether that >code also could have problems in theory/in the future, but I didn't >investigate further. In any case we already know it excludes >volatile...so maybe it's fine because in practice that's actually a >broader exclusion than what we're doing here. > I don't think postgres_fdw needs these new checks, because FDW scan's are not allowed in parallel part of the plan - if that changes in the future, I'm sure that'll require fixes in plenty other places. OTOH it's interesting that it triggers those failures - I wonder if we could learn something from them? AFAICS those paths can't be built by generate_useful_gather_paths (because of the postgres_fdw vs. parallel query restrictions), so how do these plans look? >This seems to fix the issue, but I'd like feedback on whether it's too >strict. We could of course just check em_has_volatile, but I'm >wondering if that's simultaneously too strict (by not allowing the >volatile expression to be computed in the gather merge supposing >there's no join) and too loose (maybe there are other cases we should >care about?). It also just strikes me as re-encoding rules that should >have already been applied (and thus we should be able to look up in >the data we have if it's safe to use the expr/pathkey). Those are >mostly intuitions though. > I don't know :-( As I mentioned before, I suspect checking just the volatility may not be enough in some cases (leakyness, etc.) and I think you're right it may be too strict in other cases. Not sure I understand which rules you think we're re-encoding, but I have a nagging feeling there's a piece of code somewhere earlier in the query planner meant to prevent such cases (sort on path without all the pathkeys), and that fixing it here is just a band aid :-( I'm sure we're building plans with Sort on top of Index Scan paths, so how come those don't fail? How come the non-parallel version of these queries don't fail? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: