Re: Removing INNER JOINs - Mailing list pgsql-hackers

From David Rowley
Subject Re: Removing INNER JOINs
Date
Msg-id CAApHDvpk8M6_PMH_rzJm6i2AiPZQz5emMZ-c8vfhrCxOH334Xw@mail.gmail.com
Whole thread Raw
In response to Re: Removing INNER JOINs  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Removing INNER JOINs  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On 4 December 2014 at 06:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Far better would be what I mentioned upthread: an explicit switch node
in the plan tree, analogous to the existing AlternativeSubPlan structure.

        ChooseJoinSubPlan
          -> plan tree requiring all tables to be joined
          -> plan tree not requiring all tables to be joined

This allows sensible display by EXPLAIN and avoids the need for the core
executor code to be dirtied with implementation of the precise switch
rule: all that logic goes into the ChooseJoinSubPlan plan node code.


I'm not sure if I 100% understand what you mean by sensible EXPLAIN output?

Would you have both complete plans under a ChooseJoinSubPlan? If so I don't yet understand why that is needed. Plain old EXPLAIN (without analyze) performs an executor init, which I guess must have obtained a snapshot somewhere along the line. Couldn't the plan just be selected at init time? and just have the ChooseJoinSubPlan, or whatever select the correct plan?

I've attached a version which does this, and I think the EXPLAIN is quite sensible... It shows the join removed plan when joins are removed, and shows the "All Purpose" plan when there's FK triggers pending.

The patch has a few fixme's in it, but I think it's enough to demonstrate how I think it could work.

The bulk of my changes are in allpaths.c, planmain.c and planner.c. The critical change is query_planner() now returns a List instead of a RelOptInfo. I wasn't quite sure how else to handle this. Please also notice the change to make_one_rel(). This function is now called twice if remove_useless_joins() found 1 or more INNER JOINs to be possibly removable. in remove_useless_joins() the rels are not marked as RELOPT_DEADREL like they are with LEFT JOINs, they remain as RELOPT_BASEREL, only they have the skipFlags to mark that they can be removed when there's no FK triggers pending. A flag on PlannerGlobal is also set which will later force make_one_rel() to be called twice. Once for the join removal plan, and once for the "All Purpose" plan. query_planner() then returns a list of the RelOptInfos of those 2 final rels created by make_one_rel(). All the processing that previously got done on that final rel now gets done on the list of final rels. If there's more than 1 in that list then I'm making the root node of the plan an "AlternativePlan" node. On init of this node during execution time there is some logic which chooses which plan to execute. The code here just calls ExecInitNode() on the root node of the selected plan and returns that, thus skipping over the AlternativePlan node, so that it can't be seen by EXPLAIN or EXPLAIN ANALYZE.

The patch has grown quite a bit, but most of the extra size was generated from having to indent a whole bunch code in grouping_planner() by 1 more tab.

All regression tests pass and we're back to getting the most efficient plans again. i.e no extra redundant sort node on merge joins! :-)

I'm keen to know if the attached patch is a viable option.

Oh, and if anyone is wondering why I made skipFlags a bit mask and not just a bool. I think it would also be possible to expand LEFT JOIN removal at some later date to allow deferrable unique constraints to remove LEFT JOINs. This is currently not possible due to the planner not knowing if there will be unique violations when the plan is executed. Quite possibly this could be handled in a similar way to how the INNER JOINs work in the attached.

Also, apologies for the late response on this. I've been busy the most evenings this week so far. I was quite happy when I woke up in the morning to see all the responses about this. So thank you everyone for generating some great ideas. Hopefully I'll get this one nailed down soon.

Regards

David Rowley
 
I would envision the planner starting out generating the first subplan
(without the optimization), but as it goes along, noting whether there
are any opportunities for join removal.  At the end, if it found that
there were such opportunities, re-plan assuming that removal is possible.
Then stick a switch node on top.

This would give optimal plans for both cases, and it would avoid the need
for lots of extra planner cycles when the optimization can't be applied
... except for one small detail, which is that the planner has a bad habit
of scribbling on its own input.  I'm not sure how much cleanup work would
be needed before that "re-plan" operation could happen as easily as is
suggested above.  But in principle this could be made to work.

                        regards, tom lane

Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Small TRUNCATE glitch
Next
From: Stephen Frost
Date:
Subject: Re: GSSAPI, SSPI - include_realm default