Thread: Postgres doesn't remove useless join when using partial unique index

Postgres doesn't remove useless join when using partial unique index

From
Kim Rose Carlsen
Date:
Hi

remove_useless_join does not prove uniqueness if the unique index is partial, and therefore wont remove the join if no columns are referenced (see example in bottom).

I have been trying to look around the source code and from what I have identified the problem seems to be that "check_index_predicates(..)" happens after "remove_useless_join(..)", and therefore cannot see that the unique index is actually covered by the join condition.

From analyzejoins.c:612, rel_supports_distinctness(..)
  if (ind->unique && ind->immediate &&
   (ind->indpred == NIL || ind->predOK))
   return true;

But the problem is ind->predOK is calculated in check_index_predicates(..) but this happens later so ind->predOK is always false when checked here.

I have tried to add check_index_predicates(..) to rel_supports_distinctness(..) and this produces the expected plan, but I have no idea of the implication of doing check_index_predicates(..) earlier.

This is my first time looking at the postgres source code, so I know attached "patch" is not the solution, but any pointers on where to go from here would be appreciated.


Example:
CREATE TABLE a (
  id         INTEGER PRIMARY KEY,
  sub_id     INTEGER NOT NULL,
  deleted_at TIMESTAMP
);
CREATE UNIQUE INDEX ON a (sub_id) WHERE (deleted_at IS NULL);

ANALYZE a;

EXPLAIN SELECT 1 FROM a AS a LEFT JOIN a AS b ON a.id = b.sub_id AND b.deleted_at IS NULL;

Expected plan:
                     QUERY PLAN                      
-----------------------------------------------------
 Seq Scan on a  (cost=0.00..28.50 rows=1850 width=4)

Actual plan:
                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
 Hash Left Join  (cost=14.76..48.13 rows=1850 width=4)
   Hash Cond: (a.id = b.sub_id)
   ->  Seq Scan on a  (cost=0.00..28.50 rows=1850 width=4)
   ->  Hash  (cost=14.65..14.65 rows=9 width=4)
         ->  Bitmap Heap Scan on a b  (cost=4.13..14.65 rows=9 width=4)
               Recheck Cond: (deleted_at IS NULL)
               ->  Bitmap Index Scan on a_sub_id_idx  (cost=0.00..4.13 rows=9 width=0)
(7 rows)


Attachment

Re: Postgres doesn't remove useless join when using partial unique index

From
David Rowley
Date:
On Mon, 21 Jan 2019 at 09:51, Kim Rose Carlsen <krc@hiper.dk> wrote:
> remove_useless_join does not prove uniqueness if the unique index is partial, and therefore wont remove the join if
nocolumns are referenced (see example in bottom). 
>
> I have been trying to look around the source code and from what I have identified the problem seems to be that
"check_index_predicates(..)"happens after "remove_useless_join(..)", and therefore cannot see that the unique index is
actuallycovered by the join condition. 

The main reason that join removal happens so early on in planning is
that we want to forego doing as much work as possible on a relation
that that might get removed.

> From analyzejoins.c:612, rel_supports_distinctness(..)
>   if (ind->unique && ind->immediate &&
>    (ind->indpred == NIL || ind->predOK))
>    return true;

This is really just a precheck to see if there are any unique indexes
which may serve as proof that the join does not duplicate any rows
from the other side of the join. If this fails then the code only
knows not to bother looking any further. If it passes then more work
needs to be done to see if the relation supports distinctness.

> I have tried to add check_index_predicates(..) to rel_supports_distinctness(..) and this produces the expected plan,
butI have no idea of the implication of doing check_index_predicates(..) earlier. 

Looking at check_index_predicates() it makes use of
root->all_baserels, which only gets set in make_one_rel() which is
well after the join removal is performed. So it does look like there's
a bit of a chicken and egg problem there around which relations to use
in generate_join_implied_equalities(). Moving the
check_index_predicates() call earlier would cause it to miss using
these additional quals completely due to all_baserels being empty. I'm
unsure if join removing a relation that we've found a qual during
generate_join_implied_equalities() in would be very safe. I'm not that
sure if it would even be possible to remove such a relation.  That
would require a bit of research.

Maybe it might be worth thinking about making predOK have 3 possible
values, with the additional one being "Unknown". We could then
consider calling predicate_implied_by() in
relation_has_unique_index_for() for indexes that are unique and
immediate but predOK is still unknown.  That might reduce the
additional work to a level that might be acceptable.  The extra check
could either be done before or after the column matching code in
relation_has_unique_index_for(). I guess either one is equally as
likely to fail, but one may be cheaper than the other to perform.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services