Thread: [HACKERS] convert EXSITS to inner join gotcha and bug

[HACKERS] convert EXSITS to inner join gotcha and bug

From
Teodor Sigaev
Date:
Hi!

Seems, there two issues:

1) Sometime conditions which for a first glance could be pushed down to scan are 
leaved as join quals. And it could be a ~30 times performance loss.

2) Number of query result depend on enabe_seqscan variable.

The query
explain analyze
SELECT
         *
FROM
         t1
         INNER JOIN t2 ON (
                 EXISTS (
                         SELECT
                                 true
                         FROM
                                 t3
                         WHERE
                                 t3.id1 = t1.id  AND
                                 t3.id2 = t2.id
                 )
         )
WHERE
         t1.name = '5c5fec6a41b8809972870abc154b3ecd'
;

produces following plan:
  Nested Loop  (cost=6.42..1928.71 rows=1 width=99) (actual time=71.415..148.922 
rows=162 loops=1)
    Join Filter: (t3.id1 = t1.id)
    Rows Removed by Join Filter: 70368
    ->  Index Only Scan using t1i2 on t1  (cost=0.28..8.30 rows=1 width=66) 
(actual time=0.100..0.103 rows=1 loops=1)
          Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text)
          Heap Fetches: 1
    ->  Hash Join  (cost=6.14..1918.37 rows=163 width=66) (actual 
time=0.370..120.971 rows=70530 loops=1)
(1)      Hash Cond: (t3.id2 = t2.id)
(2)      ->  Seq Scan on t3  (cost=0.00..1576.30 rows=70530 width=66) (actual 
time=0.017..27.424 rows=70530 loops=1)
          ->  Hash  (cost=3.84..3.84 rows=184 width=33) (actual 
time=0.273..0.273 rows=184 loops=1)
                Buckets: 1024  Batches: 1  Memory Usage: 20kB
                ->  Seq Scan on t2  (cost=0.00..3.84 rows=184 width=33) (actual 
time=0.017..0.105 rows=184 loops=1)
  Planning time: 7.326 ms
  Execution time: 149.115 ms


Condition (1) is not pushed to scan (2) which seemsly could be safely moved. 
With seqscan = off condition is not pushed too but query returns only one row 
instead of 162. Scan on t3 returns ~70000 rows but only ~150 rows are really 
needed. I didn't found a combination of GUCs enable_* to push down that and it 
seems to me there is reason for that which I don't see or support is somehow missed.

If pair of (t3.id1, t3.id2) is unique (see dump, there is a unique index on 
them) the query could be directly rewrited to inner join and its plan is:
  Nested Loop  (cost=9.70..299.96 rows=25 width=66) (actual time=0.376..5.232 
rows=162 loops=1)
    ->  Nested Loop  (cost=9.43..292.77 rows=25 width=99) (actual 
time=0.316..0.645 rows=162 loops=1)
          ->  Index Only Scan using t1i2 on t1  (cost=0.28..8.30 rows=1 
width=66) (actual time=0.047..0.050 rows=1 loops=1)
                Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text)
                Heap Fetches: 1
          ->  Bitmap Heap Scan on t3  (cost=9.15..283.53 rows=94 width=66) 
(actual time=0.257..0.426 rows=162 loops=1)
                Recheck Cond: (id1 = t1.id)
                Heap Blocks: exact=3
                ->  Bitmap Index Scan on t3i1  (cost=0.00..9.12 rows=94 width=0) 
(actual time=0.186..0.186 rows=162 loops=1)
                      Index Cond: (id1 = t1.id)
    ->  Index Only Scan using t2i1 on t2  (cost=0.27..0.29 rows=1 width=33) 
(actual time=0.024..0.024 rows=1 loops=162)
          Index Cond: (id = t3.id2)
          Heap Fetches: 162
  Planning time: 5.532 ms
  Execution time: 5.457 ms

Second plan is ~30 times faster. But with turned  off sequentual scan the first 
query is not work correctly, which points to some bug in planner, I suppose. 
Both 9.6 and 10devel are affected to addiction  of query result on seqscan variable.


Dump to reproduce (subset of real data but obfucated), queries are in attachment
http://sigaev.ru/misc/exists_to_nested.sql.gz
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] convert EXSITS to inner join gotcha and bug

From
Teodor Sigaev
Date:
> Both 9.6 and 10devel are affected to addiction  of query result on seqscan
> variable.
Oops, I was too nervious, 9.6 is not affected to enable_seqscan setting. But it 
doesn't push down condition too.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: [HACKERS] convert EXSITS to inner join gotcha and bug

From
Alexander Korotkov
Date:
On Fri, Apr 28, 2017 at 12:48 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
Both 9.6 and 10devel are affected to addiction  of query result on seqscan
variable.
Oops, I was too nervious, 9.6 is not affected to enable_seqscan setting. But it doesn't push down condition too.

I've reproduced this bug on d981074c.
On default config, after loading example.sql.bz2 and VACUUM ANALYZE, query result is OK.

# explain analyze SELECT
*
FROM
t1
INNER JOIN t2 ON (
EXISTS (
SELECT
true
FROM
t3
WHERE
t3.id1 = t1.id AND
t3.id2 = t2.id
)
)
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd';
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=6.42..1924.71 rows=1 width=99) (actual time=14.044..34.957 rows=162 loops=1)
   Join Filter: (t3.id1 = t1.id)
   Rows Removed by Join Filter: 70368
   ->  Index Only Scan using t1i2 on t1  (cost=0.28..4.30 rows=1 width=66) (actual time=0.026..0.028 rows=1 loops=1)
         Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text)
         Heap Fetches: 0
   ->  Hash Join  (cost=6.14..1918.37 rows=163 width=66) (actual time=0.077..28.310 rows=70530 loops=1)
         Hash Cond: (t3.id2 = t2.id)
         ->  Seq Scan on t3  (cost=0.00..1576.30 rows=70530 width=66) (actual time=0.005..6.433 rows=70530 loops=1)
         ->  Hash  (cost=3.84..3.84 rows=184 width=33) (actual time=0.065..0.065 rows=184 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 20kB
               ->  Seq Scan on t2  (cost=0.00..3.84 rows=184 width=33) (actual time=0.003..0.025 rows=184 loops=1)
 Planning time: 2.542 ms
 Execution time: 35.008 ms
(14 rows)

But with seqscan and hashjoin disabled, query returns 0 rows.

# set enable_seqscan = off;
# set enable_hashjoin = off;
# explain analyze SELECT
*
FROM
t1
INNER JOIN t2 ON (
EXISTS (
SELECT
true
FROM
t3
WHERE
t3.id1 = t1.id AND
t3.id2 = t2.id
)
)
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd';
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.97..5265.82 rows=1 width=99) (actual time=18.718..18.718 rows=0 loops=1)
   Join Filter: (t3.id1 = t1.id)
   Rows Removed by Join Filter: 163
   ->  Index Only Scan using t1i2 on t1  (cost=0.28..4.30 rows=1 width=66) (actual time=0.024..0.024 rows=1 loops=1)
         Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text)
         Heap Fetches: 0
   ->  Merge Join  (cost=0.69..5259.48 rows=163 width=66) (actual time=0.033..18.670 rows=163 loops=1)
         Merge Cond: (t2.id = t3.id2)
         ->  Index Only Scan using t2i1 on t2  (cost=0.27..19.03 rows=184 width=33) (actual time=0.015..0.038 rows=184 loops=1)
               Heap Fetches: 0
         ->  Index Only Scan using t3i2 on t3  (cost=0.42..4358.37 rows=70530 width=66) (actual time=0.015..10.484 rows=70094 loops=1)
               Heap Fetches: 0
 Planning time: 2.571 ms
 Execution time: 18.778 ms
(14 rows)

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] convert EXSITS to inner join gotcha and bug

From
Tom Lane
Date:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> I've reproduced this bug on d981074c.
> On default config, after loading example.sql.bz2 and VACUUM ANALYZE, query
> result is OK.
> But with seqscan and hashjoin disabled, query returns 0 rows.

Ah, thanks for the clue about enable_hashjoin, because it wasn't
reproducing for me as stated.

It looks like in the case that's giving wrong answers, the mergejoin
is wrongly getting marked as "Inner Unique".  Something's a bit too
cheesy about that planner logic --- not sure what, yet.
        regards, tom lane



Re: [HACKERS] convert EXSITS to inner join gotcha and bug

From
Teodor Sigaev
Date:
> Ah, thanks for the clue about enable_hashjoin, because it wasn't
> reproducing for me as stated.
I missed tweaked config, sorry

-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru                                      WWW:
http://www.sigaev.ru/



Re: [HACKERS] convert EXSITS to inner join gotcha and bug

From
David Rowley
Date:
On 29 April 2017 at 00:45, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
> On default config, after loading example.sql.bz2 and VACUUM ANALYZE, query
> result is OK.

Hi,

Did you mean to attach this?

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] convert EXSITS to inner join gotcha and bug

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> Did you mean to attach this?

See the link in Teodor's original message (it's actually a .bz2 file
not a .gz)
        regards, tom lane



Re: [HACKERS] convert EXSITS to inner join gotcha and bug

From
David Rowley
Date:
(On 29 April 2017 at 02:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
>> I've reproduced this bug on d981074c.
>> On default config, after loading example.sql.bz2 and VACUUM ANALYZE, query
>> result is OK.
>> But with seqscan and hashjoin disabled, query returns 0 rows.
>
> Ah, thanks for the clue about enable_hashjoin, because it wasn't
> reproducing for me as stated.
>
> It looks like in the case that's giving wrong answers, the mergejoin
> is wrongly getting marked as "Inner Unique".  Something's a bit too ()
> cheesy about that planner logic --- not sure what, yet.

Seems related to the unconditional setting of extra.inner_unique to
true for JOIN_UNIQUE_INNER jointypes in add_paths_to_joinrel()

Setting this based on the return value of innerrel_is_unique() as done
with the other join types seems to fix the issue.

I don't know yet if that's the correct fix. It's pretty late 'round
this side to be thinking too hard about it.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] convert EXSITS to inner join gotcha and bug

From
Alexander Korotkov
Date:
On Fri, Apr 28, 2017 at 6:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
> Did you mean to attach this?

See the link in Teodor's original message (it's actually a .bz2 file
not a .gz)

Yes, I didn't mean Teodor has renamed it.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] convert EXSITS to inner join gotcha and bug

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> (On 29 April 2017 at 02:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It looks like in the case that's giving wrong answers, the mergejoin
>> is wrongly getting marked as "Inner Unique".  Something's a bit too
>> cheesy about that planner logic --- not sure what, yet.

> Seems related to the unconditional setting of extra.inner_unique to
> true for JOIN_UNIQUE_INNER jointypes in add_paths_to_joinrel()
> Setting this based on the return value of innerrel_is_unique() as done
> with the other join types seems to fix the issue.
> I don't know yet if that's the correct fix. It's pretty late 'round
> this side to be thinking too hard about it.

Yes, I think that's correct.  I'd jumped to the conclusion that we could
skip making the test in this case, but this example shows that that's
wrong.  The problem is that, in an example like this, create_unique_path
will create a path that's unique-ified for all the join keys of the
semijoin --- but we're considering joining against just a subset of the
semijoin's outer rels, so the inner path is NOT unique for that subset.

We could possibly skip making the test if the outerrel contains
sjinfo->min_lefthand, but I'm not sufficiently excited about shaving
cycles here to take any new risks.  Let's just call innerrel_is_unique()
and be done.

Will fix in a bit, once I've managed to create a smaller test case for
the regression tests.
        regards, tom lane



Re: [HACKERS] convert EXSITS to inner join gotcha and bug

From
Tom Lane
Date:
I wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
>> (On 29 April 2017 at 02:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Seems related to the unconditional setting of extra.inner_unique to
>> true for JOIN_UNIQUE_INNER jointypes in add_paths_to_joinrel()
>> Setting this based on the return value of innerrel_is_unique() as done
>> with the other join types seems to fix the issue.

> Yes, I think that's correct.

Well, "make check-world" disabused me of that notion: there are several
test cases in postgres_fdw that lost perfectly-valid inner_unique
markings.  The reason is that create_unique_path will create uniqueness
even when you couldn't prove it from the underlying rel itself.  So my
previous thought about comparing the outerrel to sjinfo->min_lefthand is
really necessary to avoid regressions from what we had before.

However, while that seems to be enough to generate correct plans, it
doesn't address Teodor's performance complaint: he's wishing the planner
would notice that the semijoin inner rel is effectively unique, even when
the best plan involves initially joining the semijoin inner rel to just
a subset of the semijoin outer --- against which that inner rel is *not*
unique.  Applying innerrel_is_unique() helps for some simpler cases, but
not this one.

Really, the way to fix Teodor's complaint is to recognize that the
semijoin inner rel is effectively unique against the whole outer rel,
and then strength-reduce the semijoin to a plain join.  The infrastructure
we built for unique joins is capable of proving that, we just weren't
applying it in the right way.

Attached are two alternative patches.  The first just does the minimum
necessary to fix the bug; the second adds some code to perform
strength-reduction of semijoins.  The second patch is capable of finding
the plan Teodor wanted for his test case --- in fact, left to its own
devices, it finds a *better* plan, both by cost and actual runtime.

I'm kind of strongly tempted to apply the second patch; but it would
be fair to complain that reduce_unique_semijoins() is new development
and should wait for v11.  Opinions?

            regards, tom lane

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5aedcd1..39e2ddd 100644
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 126,133 ****
       *
       * We have some special cases: for JOIN_SEMI and JOIN_ANTI, it doesn't
       * matter since the executor can make the equivalent optimization anyway;
!      * we need not expend planner cycles on proofs.  For JOIN_UNIQUE_INNER, we
!      * know we're going to force uniqueness of the innerrel below.  For
       * JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid letting that value escape
       * this module.
       */
--- 126,136 ----
       *
       * We have some special cases: for JOIN_SEMI and JOIN_ANTI, it doesn't
       * matter since the executor can make the equivalent optimization anyway;
!      * we need not expend planner cycles on proofs.  For JOIN_UNIQUE_INNER, if
!      * the LHS covers all of the associated semijoin's min_lefthand, then it's
!      * appropriate to set inner_unique because the path produced by
!      * create_unique_path will be unique relative to the LHS.  (If we have an
!      * LHS that's only part of the min_lefthand, that is *not* true.)  For
       * JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid letting that value escape
       * this module.
       */
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 138,144 ****
              extra.inner_unique = false; /* well, unproven */
              break;
          case JOIN_UNIQUE_INNER:
!             extra.inner_unique = true;
              break;
          case JOIN_UNIQUE_OUTER:
              extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel,
--- 141,148 ----
              extra.inner_unique = false; /* well, unproven */
              break;
          case JOIN_UNIQUE_INNER:
!             extra.inner_unique = bms_is_subset(sjinfo->min_lefthand,
!                                                outerrel->relids);
              break;
          case JOIN_UNIQUE_OUTER:
              extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel,
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 69ce7aa..87ff365 100644
*** a/src/test/regress/expected/join.out
--- b/src/test/regress/expected/join.out
*************** reset enable_sort;
*** 5634,5636 ****
--- 5634,5665 ----
  drop table j1;
  drop table j2;
  drop table j3;
+ -- check that semijoin inner is not seen as unique for a portion of the outerrel
+ explain (verbose, costs off)
+ select t1.unique1, t2.hundred
+ from onek t1, tenk1 t2
+ where exists (select 1 from tenk1 t3
+               where t3.thousand = t1.unique1 and t3.tenthous = t2.hundred)
+       and t1.unique1 < 1;
+                                    QUERY PLAN
+ ---------------------------------------------------------------------------------
+  Nested Loop
+    Output: t1.unique1, t2.hundred
+    ->  Hash Join
+          Output: t1.unique1, t3.tenthous
+          Hash Cond: (t3.thousand = t1.unique1)
+          ->  HashAggregate
+                Output: t3.thousand, t3.tenthous
+                Group Key: t3.thousand, t3.tenthous
+                ->  Index Only Scan using tenk1_thous_tenthous on public.tenk1 t3
+                      Output: t3.thousand, t3.tenthous
+          ->  Hash
+                Output: t1.unique1
+                ->  Index Only Scan using onek_unique1 on public.onek t1
+                      Output: t1.unique1
+                      Index Cond: (t1.unique1 < 1)
+    ->  Index Only Scan using tenk1_hundred on public.tenk1 t2
+          Output: t2.hundred
+          Index Cond: (t2.hundred = t3.tenthous)
+ (18 rows)
+
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 4fc8fd5..a36e29f 100644
*** a/src/test/regress/sql/join.sql
--- b/src/test/regress/sql/join.sql
*************** reset enable_sort;
*** 1856,1858 ****
--- 1856,1866 ----
  drop table j1;
  drop table j2;
  drop table j3;
+
+ -- check that semijoin inner is not seen as unique for a portion of the outerrel
+ explain (verbose, costs off)
+ select t1.unique1, t2.hundred
+ from onek t1, tenk1 t2
+ where exists (select 1 from tenk1 t3
+               where t3.thousand = t1.unique1 and t3.tenthous = t2.hundred)
+       and t1.unique1 < 1;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5aedcd1..93d9a3e 100644
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 126,135 ****
       *
       * We have some special cases: for JOIN_SEMI and JOIN_ANTI, it doesn't
       * matter since the executor can make the equivalent optimization anyway;
!      * we need not expend planner cycles on proofs.  For JOIN_UNIQUE_INNER, we
!      * know we're going to force uniqueness of the innerrel below.  For
!      * JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid letting that value escape
!      * this module.
       */
      switch (jointype)
      {
--- 126,141 ----
       *
       * We have some special cases: for JOIN_SEMI and JOIN_ANTI, it doesn't
       * matter since the executor can make the equivalent optimization anyway;
!      * we need not expend planner cycles on proofs.  Furthermore, for either
!      * JOIN_SEMI or JOIN_UNIQUE_INNER, we must be considering a semijoin whose
!      * inner side is not provably unique --- else reduce_unique_semijoins
!      * would've simplified it --- so there's no point in calling
!      * innerrel_is_unique.  However, for JOIN_UNIQUE_INNER, if the LHS covers
!      * all of the associated semijoin's min_lefthand, then it's appropriate to
!      * set inner_unique because the path produced by create_unique_path will
!      * be unique relative to the LHS.  (If we have an LHS that's only part of
!      * the min_lefthand, that is *not* true.)  For JOIN_UNIQUE_OUTER, pass
!      * JOIN_INNER to avoid letting that value escape this module.
       */
      switch (jointype)
      {
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 138,152 ****
              extra.inner_unique = false; /* well, unproven */
              break;
          case JOIN_UNIQUE_INNER:
!             extra.inner_unique = true;
              break;
          case JOIN_UNIQUE_OUTER:
!             extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel,
!                                                     JOIN_INNER, restrictlist);
              break;
          default:
!             extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel,
!                                                     jointype, restrictlist);
              break;
      }

--- 144,165 ----
              extra.inner_unique = false; /* well, unproven */
              break;
          case JOIN_UNIQUE_INNER:
!             extra.inner_unique = bms_is_subset(sjinfo->min_lefthand,
!                                                outerrel->relids);
              break;
          case JOIN_UNIQUE_OUTER:
!             extra.inner_unique = innerrel_is_unique(root,
!                                                     outerrel->relids,
!                                                     innerrel,
!                                                     JOIN_INNER,
!                                                     restrictlist);
              break;
          default:
!             extra.inner_unique = innerrel_is_unique(root,
!                                                     outerrel->relids,
!                                                     innerrel,
!                                                     jointype,
!                                                     restrictlist);
              break;
      }

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 69b9be4..dfcc930 100644
*** a/src/backend/optimizer/plan/analyzejoins.c
--- b/src/backend/optimizer/plan/analyzejoins.c
*************** static bool rel_is_distinct_for(PlannerI
*** 42,48 ****
                      List *clause_list);
  static Oid    distinct_col_search(int colno, List *colnos, List *opids);
  static bool is_innerrel_unique_for(PlannerInfo *root,
!                        RelOptInfo *outerrel,
                         RelOptInfo *innerrel,
                         JoinType jointype,
                         List *restrictlist);
--- 42,48 ----
                      List *clause_list);
  static Oid    distinct_col_search(int colno, List *colnos, List *opids);
  static bool is_innerrel_unique_for(PlannerInfo *root,
!                        Relids outerrelids,
                         RelOptInfo *innerrel,
                         JoinType jointype,
                         List *restrictlist);
*************** remove_rel_from_joinlist(List *joinlist,
*** 496,501 ****
--- 496,583 ----


  /*
+  * reduce_unique_semijoins
+  *        Check for semijoins that can be simplified to plain inner joins
+  *        because the inner relation is provably unique for the join clauses.
+  *
+  * Ideally this would happen during reduce_outer_joins, but we don't have
+  * enough information at that point.
+  *
+  * To perform the strength reduction when applicable, we need only delete
+  * the semijoin's SpecialJoinInfo from root->join_info_list.  (We don't
+  * bother fixing the join type attributed to it in the query jointree,
+  * since that won't be consulted again.)
+  */
+ void
+ reduce_unique_semijoins(PlannerInfo *root)
+ {
+     ListCell   *lc;
+     ListCell   *next;
+
+     /*
+      * Scan the join_info_list to find semijoins.  We can't use foreach
+      * because we may delete the current cell.
+      */
+     for (lc = list_head(root->join_info_list); lc != NULL; lc = next)
+     {
+         SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
+         int            innerrelid;
+         RelOptInfo *innerrel;
+         Relids        joinrelids;
+         List       *restrictlist;
+
+         next = lnext(lc);
+
+         /*
+          * Must be a non-delaying semijoin to a single baserel, else we aren't
+          * going to be able to do anything with it.  (It's probably not
+          * possible for delay_upper_joins to be set on a semijoin, but we
+          * might as well check.)
+          */
+         if (sjinfo->jointype != JOIN_SEMI ||
+             sjinfo->delay_upper_joins)
+             continue;
+
+         if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid))
+             continue;
+
+         innerrel = find_base_rel(root, innerrelid);
+
+         /*
+          * Before we trouble to run generate_join_implied_equalities, make a
+          * quick check to eliminate cases in which we will surely be unable to
+          * prove uniqueness of the innerrel.
+          */
+         if (!rel_supports_distinctness(root, innerrel))
+             continue;
+
+         /* Compute the relid set for the join we are considering */
+         joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+
+         /*
+          * Since we're only considering a single-rel RHS, any join clauses it
+          * has must be clauses linking it to the semijoin's min_lefthand.  We
+          * can also consider EC-derived join clauses.
+          */
+         restrictlist =
+             list_concat(generate_join_implied_equalities(root,
+                                                          joinrelids,
+                                                          sjinfo->min_lefthand,
+                                                          innerrel),
+                         innerrel->joininfo);
+
+         /* Test whether the innerrel is unique for those clauses. */
+         if (!innerrel_is_unique(root, sjinfo->min_lefthand, innerrel,
+                                 JOIN_SEMI, restrictlist))
+             continue;
+
+         /* OK, remove the SpecialJoinInfo from the list. */
+         root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo);
+     }
+ }
+
+
+ /*
   * rel_supports_distinctness
   *        Could the relation possibly be proven distinct on some set of columns?
   *
*************** distinct_col_search(int colno, List *col
*** 857,862 ****
--- 939,948 ----
   *      Check if the innerrel provably contains at most one tuple matching any
   *      tuple from the outerrel, based on join clauses in the 'restrictlist'.
   *
+  * We need an actual RelOptInfo for the innerrel, but it's sufficient to
+  * identify the outerrel by its Relids.  This asymmetry supports use of this
+  * function before joinrels have been built.
+  *
   * The proof must be made based only on clauses that will be "joinquals"
   * rather than "otherquals" at execution.  For an inner join there's no
   * difference; but if the join is outer, we must ignore pushed-down quals,
*************** distinct_col_search(int colno, List *col
*** 870,876 ****
   */
  bool
  innerrel_is_unique(PlannerInfo *root,
!                    RelOptInfo *outerrel,
                     RelOptInfo *innerrel,
                     JoinType jointype,
                     List *restrictlist)
--- 956,962 ----
   */
  bool
  innerrel_is_unique(PlannerInfo *root,
!                    Relids outerrelids,
                     RelOptInfo *innerrel,
                     JoinType jointype,
                     List *restrictlist)
*************** innerrel_is_unique(PlannerInfo *root,
*** 900,906 ****
      {
          Relids        unique_for_rels = (Relids) lfirst(lc);

!         if (bms_is_subset(unique_for_rels, outerrel->relids))
              return true;        /* Success! */
      }

--- 986,992 ----
      {
          Relids        unique_for_rels = (Relids) lfirst(lc);

!         if (bms_is_subset(unique_for_rels, outerrelids))
              return true;        /* Success! */
      }

*************** innerrel_is_unique(PlannerInfo *root,
*** 912,923 ****
      {
          Relids        unique_for_rels = (Relids) lfirst(lc);

!         if (bms_is_subset(outerrel->relids, unique_for_rels))
              return false;
      }

      /* No cached information, so try to make the proof. */
!     if (is_innerrel_unique_for(root, outerrel, innerrel,
                                 jointype, restrictlist))
      {
          /*
--- 998,1009 ----
      {
          Relids        unique_for_rels = (Relids) lfirst(lc);

!         if (bms_is_subset(outerrelids, unique_for_rels))
              return false;
      }

      /* No cached information, so try to make the proof. */
!     if (is_innerrel_unique_for(root, outerrelids, innerrel,
                                 jointype, restrictlist))
      {
          /*
*************** innerrel_is_unique(PlannerInfo *root,
*** 932,938 ****
           */
          old_context = MemoryContextSwitchTo(root->planner_cxt);
          innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
!                                             bms_copy(outerrel->relids));
          MemoryContextSwitchTo(old_context);

          return true;            /* Success! */
--- 1018,1024 ----
           */
          old_context = MemoryContextSwitchTo(root->planner_cxt);
          innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
!                                             bms_copy(outerrelids));
          MemoryContextSwitchTo(old_context);

          return true;            /* Success! */
*************** innerrel_is_unique(PlannerInfo *root,
*** 957,963 ****
              old_context = MemoryContextSwitchTo(root->planner_cxt);
              innerrel->non_unique_for_rels =
                  lappend(innerrel->non_unique_for_rels,
!                         bms_copy(outerrel->relids));
              MemoryContextSwitchTo(old_context);
          }

--- 1043,1049 ----
              old_context = MemoryContextSwitchTo(root->planner_cxt);
              innerrel->non_unique_for_rels =
                  lappend(innerrel->non_unique_for_rels,
!                         bms_copy(outerrelids));
              MemoryContextSwitchTo(old_context);
          }

*************** innerrel_is_unique(PlannerInfo *root,
*** 972,978 ****
   */
  static bool
  is_innerrel_unique_for(PlannerInfo *root,
!                        RelOptInfo *outerrel,
                         RelOptInfo *innerrel,
                         JoinType jointype,
                         List *restrictlist)
--- 1058,1064 ----
   */
  static bool
  is_innerrel_unique_for(PlannerInfo *root,
!                        Relids outerrelids,
                         RelOptInfo *innerrel,
                         JoinType jointype,
                         List *restrictlist)
*************** is_innerrel_unique_for(PlannerInfo *root
*** 1007,1013 ****
           * Check if clause has the form "outer op inner" or "inner op outer",
           * and if so mark which side is inner.
           */
!         if (!clause_sides_match_join(restrictinfo, outerrel->relids,
                                       innerrel->relids))
              continue;            /* no good for these input relations */

--- 1093,1099 ----
           * Check if clause has the form "outer op inner" or "inner op outer",
           * and if so mark which side is inner.
           */
!         if (!clause_sides_match_join(restrictinfo, outerrelids,
                                       innerrel->relids))
              continue;            /* no good for these input relations */

diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index ef0de3f..74de3b8 100644
*** a/src/backend/optimizer/plan/planmain.c
--- b/src/backend/optimizer/plan/planmain.c
*************** query_planner(PlannerInfo *root, List *t
*** 193,198 ****
--- 193,204 ----
      joinlist = remove_useless_joins(root, joinlist);

      /*
+      * Also, reduce any semijoins with unique inner rels to plain inner joins.
+      * Likewise, this can't be done until now for lack of needed info.
+      */
+     reduce_unique_semijoins(root);
+
+     /*
       * Now distribute "placeholders" to base rels as needed.  This has to be
       * done after join removal because removal could change whether a
       * placeholder is evaluable at a base rel.
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 5df68a2..88ac97e 100644
*** a/src/include/optimizer/planmain.h
--- b/src/include/optimizer/planmain.h
*************** extern void match_foreign_keys_to_quals(
*** 103,112 ****
   * prototypes for plan/analyzejoins.c
   */
  extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  extern bool query_supports_distinctness(Query *query);
  extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
  extern bool innerrel_is_unique(PlannerInfo *root,
!                    RelOptInfo *outerrel, RelOptInfo *innerrel,
                     JoinType jointype, List *restrictlist);

  /*
--- 103,113 ----
   * prototypes for plan/analyzejoins.c
   */
  extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+ extern void reduce_unique_semijoins(PlannerInfo *root);
  extern bool query_supports_distinctness(Query *query);
  extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
  extern bool innerrel_is_unique(PlannerInfo *root,
!                    Relids outerrelids, RelOptInfo *innerrel,
                     JoinType jointype, List *restrictlist);

  /*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 69ce7aa..d08b1e1 100644
*** a/src/test/regress/expected/join.out
--- b/src/test/regress/expected/join.out
*************** reset enable_sort;
*** 5634,5636 ****
--- 5634,5693 ----
  drop table j1;
  drop table j2;
  drop table j3;
+ -- check that semijoin inner is not seen as unique for a portion of the outerrel
+ explain (verbose, costs off)
+ select t1.unique1, t2.hundred
+ from onek t1, tenk1 t2
+ where exists (select 1 from tenk1 t3
+               where t3.thousand = t1.unique1 and t3.tenthous = t2.hundred)
+       and t1.unique1 < 1;
+                                    QUERY PLAN
+ ---------------------------------------------------------------------------------
+  Nested Loop
+    Output: t1.unique1, t2.hundred
+    ->  Hash Join
+          Output: t1.unique1, t3.tenthous
+          Hash Cond: (t3.thousand = t1.unique1)
+          ->  HashAggregate
+                Output: t3.thousand, t3.tenthous
+                Group Key: t3.thousand, t3.tenthous
+                ->  Index Only Scan using tenk1_thous_tenthous on public.tenk1 t3
+                      Output: t3.thousand, t3.tenthous
+          ->  Hash
+                Output: t1.unique1
+                ->  Index Only Scan using onek_unique1 on public.onek t1
+                      Output: t1.unique1
+                      Index Cond: (t1.unique1 < 1)
+    ->  Index Only Scan using tenk1_hundred on public.tenk1 t2
+          Output: t2.hundred
+          Index Cond: (t2.hundred = t3.tenthous)
+ (18 rows)
+
+ -- ... unless it actually is unique
+ create table j3 as select unique1, tenthous from onek;
+ vacuum analyze j3;
+ create unique index on j3(unique1, tenthous);
+ explain (verbose, costs off)
+ select t1.unique1, t2.hundred
+ from onek t1, tenk1 t2
+ where exists (select 1 from j3
+               where j3.unique1 = t1.unique1 and j3.tenthous = t2.hundred)
+       and t1.unique1 < 1;
+                                QUERY PLAN
+ ------------------------------------------------------------------------
+  Nested Loop
+    Output: t1.unique1, t2.hundred
+    ->  Nested Loop
+          Output: t1.unique1, j3.tenthous
+          ->  Index Only Scan using onek_unique1 on public.onek t1
+                Output: t1.unique1
+                Index Cond: (t1.unique1 < 1)
+          ->  Index Only Scan using j3_unique1_tenthous_idx on public.j3
+                Output: j3.unique1, j3.tenthous
+                Index Cond: (j3.unique1 = t1.unique1)
+    ->  Index Only Scan using tenk1_hundred on public.tenk1 t2
+          Output: t2.hundred
+          Index Cond: (t2.hundred = j3.tenthous)
+ (13 rows)
+
+ drop table j3;
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index aa06d1d..f6b51a5 100644
*** a/src/test/regress/expected/updatable_views.out
--- b/src/test/regress/expected/updatable_views.out
*************** EXPLAIN (costs off) UPDATE rw_view1 SET
*** 1673,1679 ****
                             QUERY PLAN
  -----------------------------------------------------------------
   Update on base_tbl b
!    ->  Hash Semi Join
           Hash Cond: (b.a = r.a)
           ->  Seq Scan on base_tbl b
           ->  Hash
--- 1673,1679 ----
                             QUERY PLAN
  -----------------------------------------------------------------
   Update on base_tbl b
!    ->  Hash Join
           Hash Cond: (b.a = r.a)
           ->  Seq Scan on base_tbl b
           ->  Hash
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 4fc8fd5..c3994ea 100644
*** a/src/test/regress/sql/join.sql
--- b/src/test/regress/sql/join.sql
*************** reset enable_sort;
*** 1856,1858 ****
--- 1856,1880 ----
  drop table j1;
  drop table j2;
  drop table j3;
+
+ -- check that semijoin inner is not seen as unique for a portion of the outerrel
+ explain (verbose, costs off)
+ select t1.unique1, t2.hundred
+ from onek t1, tenk1 t2
+ where exists (select 1 from tenk1 t3
+               where t3.thousand = t1.unique1 and t3.tenthous = t2.hundred)
+       and t1.unique1 < 1;
+
+ -- ... unless it actually is unique
+ create table j3 as select unique1, tenthous from onek;
+ vacuum analyze j3;
+ create unique index on j3(unique1, tenthous);
+
+ explain (verbose, costs off)
+ select t1.unique1, t2.hundred
+ from onek t1, tenk1 t2
+ where exists (select 1 from j3
+               where j3.unique1 = t1.unique1 and j3.tenthous = t2.hundred)
+       and t1.unique1 < 1;
+
+ drop table j3;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] convert EXSITS to inner join gotcha and bug

From
Teodor Sigaev
Date:
> Really, the way to fix Teodor's complaint is to recognize that the
> semijoin inner rel is effectively unique against the whole outer rel,
> and then strength-reduce the semijoin to a plain join.  The infrastructure
> we built for unique joins is capable of proving that, we just weren't
> applying it in the right way.
Perfect, it works. Thank you! Second patch reduces time of full query (my 
example was just a small part) from 20 minutes to 20 seconds.

> I'm kind of strongly tempted to apply the second patch; but it would
> be fair to complain that reduce_unique_semijoins() is new development
> and should wait for v11.  Opinions?

Obviously, I'm voting for second patch applied to version 10.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: [HACKERS] convert EXSITS to inner join gotcha and bug

From
David Rowley
Date:
On 29 April 2017 at 15:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm kind of strongly tempted to apply the second patch; but it would
> be fair to complain that reduce_unique_semijoins() is new development
> and should wait for v11.  Opinions?

My vote is for the non-minimal patch. Of course, I'd be voting for
minimal patch if this was for a minor version release fix, but we're
not even in beta yet for v10. The original patch was intended to fix
cases like this, although I'd failed to realise this particular case.


-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] convert EXSITS to inner join gotcha and bug

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 29 April 2017 at 15:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm kind of strongly tempted to apply the second patch; but it would
>> be fair to complain that reduce_unique_semijoins() is new development
>> and should wait for v11.  Opinions?

> My vote is for the non-minimal patch. Of course, I'd be voting for
> minimal patch if this was for a minor version release fix, but we're
> not even in beta yet for v10. The original patch was intended to fix
> cases like this, although I'd failed to realise this particular case.

Yeah, I thought we'd discussed doing something more or less like this
way back in that thread.

After studying the patch some more, I noticed that reduce_unique_semijoins
falsifies the assumption in innerrel_is_unique that we only probe inner
uniqueness for steadily larger relid sets.  If the semijoin LHS is more
than one relation, then it'll test inner uniqueness using that LHS, and if
the proof fails, that's knowledge that can save individual proof attempts
for the individual LHS rels later on during the join search.  So in the
attached, I've modified reduce_unique_semijoins's API a bit more to allow
the caller to override the don't-cache heuristic.

Also, this form of the patch is an incremental patch over the minimal
fix I posted yesterday.  It seems like a good idea to push it as a
separate commit, if only for future bisection purposes.

If I don't hear objections, I'll push this tomorrow sometime.

            regards, tom lane

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 39e2ddd..c130d2f 100644
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 126,138 ****
       *
       * We have some special cases: for JOIN_SEMI and JOIN_ANTI, it doesn't
       * matter since the executor can make the equivalent optimization anyway;
!      * we need not expend planner cycles on proofs.  For JOIN_UNIQUE_INNER, if
!      * the LHS covers all of the associated semijoin's min_lefthand, then it's
!      * appropriate to set inner_unique because the path produced by
!      * create_unique_path will be unique relative to the LHS.  (If we have an
!      * LHS that's only part of the min_lefthand, that is *not* true.)  For
!      * JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid letting that value escape
!      * this module.
       */
      switch (jointype)
      {
--- 126,140 ----
       *
       * We have some special cases: for JOIN_SEMI and JOIN_ANTI, it doesn't
       * matter since the executor can make the equivalent optimization anyway;
!      * we need not expend planner cycles on proofs.  For JOIN_UNIQUE_INNER, we
!      * must be considering a semijoin whose inner side is not provably unique
!      * (else reduce_unique_semijoins would've simplified it), so there's no
!      * point in calling innerrel_is_unique.  However, if the LHS covers all of
!      * the semijoin's min_lefthand, then it's appropriate to set inner_unique
!      * because the path produced by create_unique_path will be unique relative
!      * to the LHS.  (If we have an LHS that's only part of the min_lefthand,
!      * that is *not* true.)  For JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid
!      * letting that value escape this module.
       */
      switch (jointype)
      {
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 145,156 ****
                                                 outerrel->relids);
              break;
          case JOIN_UNIQUE_OUTER:
!             extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel,
!                                                     JOIN_INNER, restrictlist);
              break;
          default:
!             extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel,
!                                                     jointype, restrictlist);
              break;
      }

--- 147,166 ----
                                                 outerrel->relids);
              break;
          case JOIN_UNIQUE_OUTER:
!             extra.inner_unique = innerrel_is_unique(root,
!                                                     outerrel->relids,
!                                                     innerrel,
!                                                     JOIN_INNER,
!                                                     restrictlist,
!                                                     false);
              break;
          default:
!             extra.inner_unique = innerrel_is_unique(root,
!                                                     outerrel->relids,
!                                                     innerrel,
!                                                     jointype,
!                                                     restrictlist,
!                                                     false);
              break;
      }

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 69b9be4..34317fe 100644
*** a/src/backend/optimizer/plan/analyzejoins.c
--- b/src/backend/optimizer/plan/analyzejoins.c
*************** static bool rel_is_distinct_for(PlannerI
*** 42,48 ****
                      List *clause_list);
  static Oid    distinct_col_search(int colno, List *colnos, List *opids);
  static bool is_innerrel_unique_for(PlannerInfo *root,
!                        RelOptInfo *outerrel,
                         RelOptInfo *innerrel,
                         JoinType jointype,
                         List *restrictlist);
--- 42,48 ----
                      List *clause_list);
  static Oid    distinct_col_search(int colno, List *colnos, List *opids);
  static bool is_innerrel_unique_for(PlannerInfo *root,
!                        Relids outerrelids,
                         RelOptInfo *innerrel,
                         JoinType jointype,
                         List *restrictlist);
*************** remove_rel_from_joinlist(List *joinlist,
*** 496,501 ****
--- 496,583 ----


  /*
+  * reduce_unique_semijoins
+  *        Check for semijoins that can be simplified to plain inner joins
+  *        because the inner relation is provably unique for the join clauses.
+  *
+  * Ideally this would happen during reduce_outer_joins, but we don't have
+  * enough information at that point.
+  *
+  * To perform the strength reduction when applicable, we need only delete
+  * the semijoin's SpecialJoinInfo from root->join_info_list.  (We don't
+  * bother fixing the join type attributed to it in the query jointree,
+  * since that won't be consulted again.)
+  */
+ void
+ reduce_unique_semijoins(PlannerInfo *root)
+ {
+     ListCell   *lc;
+     ListCell   *next;
+
+     /*
+      * Scan the join_info_list to find semijoins.  We can't use foreach
+      * because we may delete the current cell.
+      */
+     for (lc = list_head(root->join_info_list); lc != NULL; lc = next)
+     {
+         SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
+         int            innerrelid;
+         RelOptInfo *innerrel;
+         Relids        joinrelids;
+         List       *restrictlist;
+
+         next = lnext(lc);
+
+         /*
+          * Must be a non-delaying semijoin to a single baserel, else we aren't
+          * going to be able to do anything with it.  (It's probably not
+          * possible for delay_upper_joins to be set on a semijoin, but we
+          * might as well check.)
+          */
+         if (sjinfo->jointype != JOIN_SEMI ||
+             sjinfo->delay_upper_joins)
+             continue;
+
+         if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid))
+             continue;
+
+         innerrel = find_base_rel(root, innerrelid);
+
+         /*
+          * Before we trouble to run generate_join_implied_equalities, make a
+          * quick check to eliminate cases in which we will surely be unable to
+          * prove uniqueness of the innerrel.
+          */
+         if (!rel_supports_distinctness(root, innerrel))
+             continue;
+
+         /* Compute the relid set for the join we are considering */
+         joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+
+         /*
+          * Since we're only considering a single-rel RHS, any join clauses it
+          * has must be clauses linking it to the semijoin's min_lefthand.  We
+          * can also consider EC-derived join clauses.
+          */
+         restrictlist =
+             list_concat(generate_join_implied_equalities(root,
+                                                          joinrelids,
+                                                          sjinfo->min_lefthand,
+                                                          innerrel),
+                         innerrel->joininfo);
+
+         /* Test whether the innerrel is unique for those clauses. */
+         if (!innerrel_is_unique(root, sjinfo->min_lefthand, innerrel,
+                                 JOIN_SEMI, restrictlist, true))
+             continue;
+
+         /* OK, remove the SpecialJoinInfo from the list. */
+         root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo);
+     }
+ }
+
+
+ /*
   * rel_supports_distinctness
   *        Could the relation possibly be proven distinct on some set of columns?
   *
*************** distinct_col_search(int colno, List *col
*** 857,862 ****
--- 939,948 ----
   *      Check if the innerrel provably contains at most one tuple matching any
   *      tuple from the outerrel, based on join clauses in the 'restrictlist'.
   *
+  * We need an actual RelOptInfo for the innerrel, but it's sufficient to
+  * identify the outerrel by its Relids.  This asymmetry supports use of this
+  * function before joinrels have been built.
+  *
   * The proof must be made based only on clauses that will be "joinquals"
   * rather than "otherquals" at execution.  For an inner join there's no
   * difference; but if the join is outer, we must ignore pushed-down quals,
*************** distinct_col_search(int colno, List *col
*** 867,879 ****
   *
   * The actual proof is undertaken by is_innerrel_unique_for(); this function
   * is a frontend that is mainly concerned with caching the answers.
   */
  bool
  innerrel_is_unique(PlannerInfo *root,
!                    RelOptInfo *outerrel,
                     RelOptInfo *innerrel,
                     JoinType jointype,
!                    List *restrictlist)
  {
      MemoryContext old_context;
      ListCell   *lc;
--- 953,970 ----
   *
   * The actual proof is undertaken by is_innerrel_unique_for(); this function
   * is a frontend that is mainly concerned with caching the answers.
+  * In particular, the force_cache argument allows overriding the internal
+  * heuristic about whether to cache negative answers; it should be "true"
+  * if making an inquiry that is not part of the normal bottom-up join search
+  * sequence.
   */
  bool
  innerrel_is_unique(PlannerInfo *root,
!                    Relids outerrelids,
                     RelOptInfo *innerrel,
                     JoinType jointype,
!                    List *restrictlist,
!                    bool force_cache)
  {
      MemoryContext old_context;
      ListCell   *lc;
*************** innerrel_is_unique(PlannerInfo *root,
*** 900,906 ****
      {
          Relids        unique_for_rels = (Relids) lfirst(lc);

!         if (bms_is_subset(unique_for_rels, outerrel->relids))
              return true;        /* Success! */
      }

--- 991,997 ----
      {
          Relids        unique_for_rels = (Relids) lfirst(lc);

!         if (bms_is_subset(unique_for_rels, outerrelids))
              return true;        /* Success! */
      }

*************** innerrel_is_unique(PlannerInfo *root,
*** 912,923 ****
      {
          Relids        unique_for_rels = (Relids) lfirst(lc);

!         if (bms_is_subset(outerrel->relids, unique_for_rels))
              return false;
      }

      /* No cached information, so try to make the proof. */
!     if (is_innerrel_unique_for(root, outerrel, innerrel,
                                 jointype, restrictlist))
      {
          /*
--- 1003,1014 ----
      {
          Relids        unique_for_rels = (Relids) lfirst(lc);

!         if (bms_is_subset(outerrelids, unique_for_rels))
              return false;
      }

      /* No cached information, so try to make the proof. */
!     if (is_innerrel_unique_for(root, outerrelids, innerrel,
                                 jointype, restrictlist))
      {
          /*
*************** innerrel_is_unique(PlannerInfo *root,
*** 932,938 ****
           */
          old_context = MemoryContextSwitchTo(root->planner_cxt);
          innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
!                                             bms_copy(outerrel->relids));
          MemoryContextSwitchTo(old_context);

          return true;            /* Success! */
--- 1023,1029 ----
           */
          old_context = MemoryContextSwitchTo(root->planner_cxt);
          innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
!                                             bms_copy(outerrelids));
          MemoryContextSwitchTo(old_context);

          return true;            /* Success! */
*************** innerrel_is_unique(PlannerInfo *root,
*** 949,963 ****
           * from smaller to larger.  It is useful in GEQO mode, where the
           * knowledge can be carried across successive planning attempts; and
           * it's likely to be useful when using join-search plugins, too. Hence
!          * cache only when join_search_private is non-NULL.  (Yeah, that's a
!          * hack, but it seems reasonable.)
           */
!         if (root->join_search_private)
          {
              old_context = MemoryContextSwitchTo(root->planner_cxt);
              innerrel->non_unique_for_rels =
                  lappend(innerrel->non_unique_for_rels,
!                         bms_copy(outerrel->relids));
              MemoryContextSwitchTo(old_context);
          }

--- 1040,1058 ----
           * from smaller to larger.  It is useful in GEQO mode, where the
           * knowledge can be carried across successive planning attempts; and
           * it's likely to be useful when using join-search plugins, too. Hence
!          * cache when join_search_private is non-NULL.  (Yeah, that's a hack,
!          * but it seems reasonable.)
!          *
!          * Also, allow callers to override that heuristic and force caching;
!          * that's useful for reduce_unique_semijoins, which calls here before
!          * the normal join search starts.
           */
!         if (force_cache || root->join_search_private)
          {
              old_context = MemoryContextSwitchTo(root->planner_cxt);
              innerrel->non_unique_for_rels =
                  lappend(innerrel->non_unique_for_rels,
!                         bms_copy(outerrelids));
              MemoryContextSwitchTo(old_context);
          }

*************** innerrel_is_unique(PlannerInfo *root,
*** 972,978 ****
   */
  static bool
  is_innerrel_unique_for(PlannerInfo *root,
!                        RelOptInfo *outerrel,
                         RelOptInfo *innerrel,
                         JoinType jointype,
                         List *restrictlist)
--- 1067,1073 ----
   */
  static bool
  is_innerrel_unique_for(PlannerInfo *root,
!                        Relids outerrelids,
                         RelOptInfo *innerrel,
                         JoinType jointype,
                         List *restrictlist)
*************** is_innerrel_unique_for(PlannerInfo *root
*** 1007,1013 ****
           * Check if clause has the form "outer op inner" or "inner op outer",
           * and if so mark which side is inner.
           */
!         if (!clause_sides_match_join(restrictinfo, outerrel->relids,
                                       innerrel->relids))
              continue;            /* no good for these input relations */

--- 1102,1108 ----
           * Check if clause has the form "outer op inner" or "inner op outer",
           * and if so mark which side is inner.
           */
!         if (!clause_sides_match_join(restrictinfo, outerrelids,
                                       innerrel->relids))
              continue;            /* no good for these input relations */

diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index ef0de3f..74de3b8 100644
*** a/src/backend/optimizer/plan/planmain.c
--- b/src/backend/optimizer/plan/planmain.c
*************** query_planner(PlannerInfo *root, List *t
*** 193,198 ****
--- 193,204 ----
      joinlist = remove_useless_joins(root, joinlist);

      /*
+      * Also, reduce any semijoins with unique inner rels to plain inner joins.
+      * Likewise, this can't be done until now for lack of needed info.
+      */
+     reduce_unique_semijoins(root);
+
+     /*
       * Now distribute "placeholders" to base rels as needed.  This has to be
       * done after join removal because removal could change whether a
       * placeholder is evaluable at a base rel.
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 5df68a2..e773c0f 100644
*** a/src/include/optimizer/planmain.h
--- b/src/include/optimizer/planmain.h
*************** extern void match_foreign_keys_to_quals(
*** 103,113 ****
   * prototypes for plan/analyzejoins.c
   */
  extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  extern bool query_supports_distinctness(Query *query);
  extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
  extern bool innerrel_is_unique(PlannerInfo *root,
!                    RelOptInfo *outerrel, RelOptInfo *innerrel,
!                    JoinType jointype, List *restrictlist);

  /*
   * prototypes for plan/setrefs.c
--- 103,114 ----
   * prototypes for plan/analyzejoins.c
   */
  extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+ extern void reduce_unique_semijoins(PlannerInfo *root);
  extern bool query_supports_distinctness(Query *query);
  extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
  extern bool innerrel_is_unique(PlannerInfo *root,
!                    Relids outerrelids, RelOptInfo *innerrel,
!                    JoinType jointype, List *restrictlist, bool force_cache);

  /*
   * prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 87ff365..d08b1e1 100644
*** a/src/test/regress/expected/join.out
--- b/src/test/regress/expected/join.out
*************** where exists (select 1 from tenk1 t3
*** 5663,5665 ****
--- 5663,5693 ----
           Index Cond: (t2.hundred = t3.tenthous)
  (18 rows)

+ -- ... unless it actually is unique
+ create table j3 as select unique1, tenthous from onek;
+ vacuum analyze j3;
+ create unique index on j3(unique1, tenthous);
+ explain (verbose, costs off)
+ select t1.unique1, t2.hundred
+ from onek t1, tenk1 t2
+ where exists (select 1 from j3
+               where j3.unique1 = t1.unique1 and j3.tenthous = t2.hundred)
+       and t1.unique1 < 1;
+                                QUERY PLAN
+ ------------------------------------------------------------------------
+  Nested Loop
+    Output: t1.unique1, t2.hundred
+    ->  Nested Loop
+          Output: t1.unique1, j3.tenthous
+          ->  Index Only Scan using onek_unique1 on public.onek t1
+                Output: t1.unique1
+                Index Cond: (t1.unique1 < 1)
+          ->  Index Only Scan using j3_unique1_tenthous_idx on public.j3
+                Output: j3.unique1, j3.tenthous
+                Index Cond: (j3.unique1 = t1.unique1)
+    ->  Index Only Scan using tenk1_hundred on public.tenk1 t2
+          Output: t2.hundred
+          Index Cond: (t2.hundred = j3.tenthous)
+ (13 rows)
+
+ drop table j3;
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index aa06d1d..f6b51a5 100644
*** a/src/test/regress/expected/updatable_views.out
--- b/src/test/regress/expected/updatable_views.out
*************** EXPLAIN (costs off) UPDATE rw_view1 SET
*** 1673,1679 ****
                             QUERY PLAN
  -----------------------------------------------------------------
   Update on base_tbl b
!    ->  Hash Semi Join
           Hash Cond: (b.a = r.a)
           ->  Seq Scan on base_tbl b
           ->  Hash
--- 1673,1679 ----
                             QUERY PLAN
  -----------------------------------------------------------------
   Update on base_tbl b
!    ->  Hash Join
           Hash Cond: (b.a = r.a)
           ->  Seq Scan on base_tbl b
           ->  Hash
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index a36e29f..c3994ea 100644
*** a/src/test/regress/sql/join.sql
--- b/src/test/regress/sql/join.sql
*************** from onek t1, tenk1 t2
*** 1864,1866 ****
--- 1864,1880 ----
  where exists (select 1 from tenk1 t3
                where t3.thousand = t1.unique1 and t3.tenthous = t2.hundred)
        and t1.unique1 < 1;
+
+ -- ... unless it actually is unique
+ create table j3 as select unique1, tenthous from onek;
+ vacuum analyze j3;
+ create unique index on j3(unique1, tenthous);
+
+ explain (verbose, costs off)
+ select t1.unique1, t2.hundred
+ from onek t1, tenk1 t2
+ where exists (select 1 from j3
+               where j3.unique1 = t1.unique1 and j3.tenthous = t2.hundred)
+       and t1.unique1 < 1;
+
+ drop table j3;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers