Re: Removing unneeded self joins - Mailing list pgsql-hackers

From Alexander Kuzmenkov
Subject Re: Removing unneeded self joins
Date
Msg-id 31c2ee33-5db1-a43c-b964-8e2624020ae9@postgrespro.ru
Whole thread Raw
In response to Re: Removing unneeded self joins  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Removing unneeded self joins
List pgsql-hackers
On 3/21/19 01:54, David Rowley wrote:
> I really just don't think checking the unique indexes match is going
> to cut it. You should be looking for a unique index where the join
> clauses match on either side of the join, not looking independently
> and then checking the indexes are the same ones.


The bug you mention later is an implementation bug that can be fixed (I 
will expand on that below). Besides this, do you think current self-join 
detection algorithm has fundamental correctness problems? I am not aware 
of such problems and this algorithm reflects my understanding of what 
constitutes a removable self-join, so it would be helpful if you could 
explain what exactly is wrong with it.

Your alternative idea sounds plausible, but I won't know it's correct 
for sure until I implement it, which is a nontrivial amount of work. I 
am also concerned that we will have to redo calculations similar to 
innerrel_is_unique(), because having near-zero overhead is a hard 
prerequisite for this patch, as was discussed upthread.

In short, I am reluctant to implement a new approach to detection until 
I understand why the current one is fundamentally broken.


> Here's an example of what can go wrong with your current code:


This is a bug indeed. Unique index search is not exhaustive, so if many 
indexes match the join quals, we might not find the same index for both 
sides. I think this can be overcome if we switch to exhaustive search in 
relation_has_unique_index_for, and then try to find matching indexes in 
is_unique_self_join. Please see the new patch for the fix.


> I also think this way would give you the subquery GROUP BY / DISTINCT
> self join removal for just about free.


Could you explain how exactly we can generalize join removal to the 
DISTINCT/GROUP BY case? I understand the removable self-joins as having 
the same row on both sides of join, as identified by ctid, but I'm not 
sure how to apply this to subqueries. Your earlier DISTINCT example 
looked like it needed a different kind of join removal, with different 
conditions for when it applies (please see my previous letter for details).


Another thing I'd like to mention is that this patch splits in two 
independent parts, the detection of self-joins and their removal. While 
we are looking for agreement on the detection part, could you also 
review the removal part? I'm sure it has its own share of problems.

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Ordered Partitioned Table Scans
Next
From: Tom Lane
Date:
Subject: Re: Ordered Partitioned Table Scans