Re: Removing INNER JOINs - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Removing INNER JOINs |
Date | |
Msg-id | 32139.1427667410@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Removing INNER JOINs (David Rowley <dgrowleyml@gmail.com>) |
List | pgsql-hackers |
David Rowley <dgrowleyml@gmail.com> writes: > I'm not worried about the cost of the decision at plan init time. I was > just confused about what Tom was recommending. I couldn't quite decide from > his email if he meant to keep the alternative plan node around so that the > executor must transition through it for each row, or to just choose the > proper plan at executor init and return the actual root of the selected > plan instead of returning the initialised AlternativePlan node (see > nodeAlternativePlan.c) TBH, I don't like anything about this patch and would prefer to reject it altogether. I think it's a disaster from a system structural standpoint, will probably induce nasty bugs, and simply doesn't apply to enough real-world queries to be worth either the pain of making it work or the planning overhead it will add. The structural damage could be reduced if you got rid of the far-too-cute idea that you could cut the AlternativePlan node out of the plan tree at executor startup time. Just have it remember which decision it made and pass down all the calls to the appropriate child node. What you're doing here violates the rule that planstate trees have a one-to-one relationship to plan trees. EXPLAIN used to iterate over those trees in lockstep, and there probably still is code that does similar things (in third-party modules if not core), so I don't think we should abandon that principle. Also, the patch seems rather schizophrenic about whether it's relying on run-time decisions or not; in particular I see no use of the PlannedStmt.suitableFor field, and no reason to have one if there's to be an AlternativePlan node making the decision. I'm just about certain that you can't generate the two alternative plans simply by calling make_one_rel() twice. The planner has far too much tendency to scribble on its input data structures for that to work reliably. It's possible you could dodge bugs of that sort, and save some cycles, if you did base relation planning only once and created the alternatives only while working at the join-relation level. Even then I think you'd need to do pushups like what GEQO does in order to repeat join-level planning. (While I'm looking at that: what used to be a reasonably clear API between query_planner and grouping_planner now seems like a complete mess, and I'm quite certain you've created multiple bugs in grouping_planner, because it would need to track more than one e.g. current_pathkeys value for the subplans created for the different final_rels. You can't just arbitrarily do some of those steps in one loop and the others in a totally different loop.) As far as the fundamental-bugs issue goes, I have no faith that "there are no pending AFTER trigger events" is a sufficient condition for "all known foreign-key constraints hold against whatever-random-snapshot-I'm-using"; and it's certainly not a *necessary* condition. (BTW the patch should be doing its best to localize knowledge about that, rather than spreading the assumption far and wide through comments and choices of flag/variable names.) I think the best you can hope to say in that line is that if there were no pending events at the time the snapshot was taken, it might be all right. Maybe. But it's not hard to imagine the trigger queue getting emptied while snapshots still exist that can see inconsistent states that prevailed before the triggers fired. (Remember that a trigger might restore consistency by applying additional changes, not just by throwing an error.) STABLE functions are the pain point here since they could execute queries with snapshots from quite some time back. As for the cost issue, that's an awful lot of code you added to remove_useless_joins(). I'm concerned about how much time it's going to spend while failing to prove anything for most queries. For starters, it looks to be O(N^2) in the number of relations, which the existing join_is_removable code is not; and in general it looks like it will do work in many more common cases than the existing code has to. I'm also pretty unhappy about having get_relation_info() physically visiting the pg_constraint catalog for every relation that's ever scanned by any query. (You could probably teach the relcache to remember data about foreign-key constraints, instead of doing it this way.) regards, tom lane
pgsql-hackers by date: