Re: Removing INNER JOINs - Mailing list pgsql-hackers

From Mart Kelder
Subject Re: Removing INNER JOINs
Date
Msg-id m5eqvh$j92$1@ger.gmane.org
Whole thread Raw
In response to Removing INNER JOINs  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Removing INNER JOINs  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
Hi David (and others),

David Rowley wrote:
> Hi,
> 
> Starting a new thread which continues on from
> http://www.postgresql.org/message-id/CAApHDvoeC8YGWoahVSri-84eN2k0TnH6GPXp1K59y9juC1WWBg@mail.gmail.com
> 
> To give a brief summary for any new readers:
> 
> The attached patch allows for INNER JOINed relations to be removed from
> the plan, providing none of the columns are used for anything, and a
> foreign key exists which proves that a record must exist in the table
> being removed which matches the join condition:
> 
> I'm looking for a bit of feedback around the method I'm using to prune the
> redundant plan nodes out of the plan tree at executor startup.
> Particularly around not stripping the Sort nodes out from below a merge
> join, even if the sort order is no longer required due to the merge join
> node being removed. This potentially could leave the plan suboptimal when
> compared to a plan that the planner could generate when the removed
> relation was never asked for in the first place.

I did read this patch (and the previous patch about removing SEMI-joins) 
with great interest. I don't know the code well enough to say much about the 
patch itself, but I hope to have some usefull ideas about the the global 
process.

I think performance can be greatly improved if the planner is able to use 
information based on the current data. I think these patches are just two 
examples of where assumptions during planning are usefull. I think there are 
more possibilities for this kind of assumpions (for example unique 
constraints, empty tables).

> There are some more details around the reasons behind doing this weird
> executor startup plan pruning around here:
> 
> http://www.postgresql.org/message-id/20141006145957.GA20577@awork2.anarazel.de

The problem here is that assumpions done during planning might not hold 
during execution. That is why you placed the final decision about removing a 
join in the executor.

If a plan is made, you know under which assumptions are made in the final 
plan. In this case, the assumption is that a foreign key is still valid. In 
general, there are a lot more assumptions, such as the still existing of an 
index or the still existing of columns. There also are soft assumptions, 
assuming that the used statistics are still reasonable.

My suggestion is to check the assumptions at the start of executor. If they 
still hold, you can just execute the plan as it is.

If one or more assumptions doesn't hold, there are a couple of things you 
might do:
* Make a new plan. The plan is certain to match all conditions because at 
that time, a snapshot is already taken.
* Check the assumption. This can be a costly operation with no guarantee of 
success.
* Change the existing plan to not rely on the failed assumption.
* Use an already stored alternate plan (generate during the initial plan).

You currently change the plan in executer code. I suggest to go back to the 
planner if the assumpion doesn't hold. The planner can then decide to change 
the plan. The planner can also conclude to fully replan if there are reasons 
for it.

If the planner knows that it needs to replan if the assumption will not hold 
during execution, the cost of replanning multiplied by the chance of the 
assumption not holding during exeuction should be part of the decision to 
deliver a plan with an assumpion in the first place.

> There are also other cases such as MergeJoins performing btree index scans
> in order to obtain ordered results for a MergeJoin that would be better
> executed as a SeqScan when the MergeJoin can be removed.
> 
> Perhaps some costs could be adjusted at planning time when there's a
> possibility that joins could be removed at execution time, although I'm
> not quite sure about this as it risks generating a poor plan in the case
> when the joins cannot be removed.

Maybe this is a case where you are better off replanning if the assumption 
doesn't hold instead of changing the generated exeuction plan. In that case 
you can remove the join before the path is made.

> Comments are most welcome
> 
> Regards
> 
> David Rowley

Regards,

Mart




pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: How about a option to disable autovacuum cancellation on lock conflict?
Next
From: David Rowley
Date:
Subject: Re: Removing INNER JOINs