Thread: Reducing duplicativeness of EquivalenceClass-derived clauses

Reducing duplicativeness of EquivalenceClass-derived clauses

From
Tom Lane
Date:
While fooling with my longstanding outer-join variables changes
(I am making progress on that, honest), I happened to notice that
equivclass.c is leaving some money on the table by generating
redundant RestrictInfo clauses.  It already attempts to not generate
the same clause twice, which can save a nontrivial amount of work
because we cache selectivity estimates and so on per-RestrictInfo.
I realized though that it will build and return a clause like
"a.x = b.y" even if it already has "b.y = a.x".  This is just
wasteful.  It's always been the case that equivclass.c will
produce clauses that are ordered according to its own whims.
Consumers that need the operands in a specific order, such as
index scans or hash joins, are required to commute the clause
to be the way they want it while building the finished plan.
Therefore, it shouldn't matter which order of the operands we
return, and giving back the commutator clause if available could
potentially save as much as half of the selectivity-estimation
work we do with these clauses subsequently.

Hence, PFA a patch that adjusts create_join_clause() to notice
commuted as well as exact matches among the EquivalenceClass's
existing clauses.  This results in a number of changes visible in
regression test cases, but they're all clearly inconsequential.

The only thing that I think might be controversial here is that
I dropped the check for matching operator OID.  To preserve that,
we'd have needed to use get_commutator() in the reverse-match cases,
which it seemed to me would be a completely unjustified expenditure
of cycles.  The operators we select for freshly-generated clauses
will certainly always match those of previously-generated clauses.
Maybe there's a chance that they'd not match those of ec_sources
clauses (that is, the user-written clauses we started from), but
if they don't and that actually makes any difference then surely
we are talking about a buggy opclass definition.

I've not bothered to make any performance tests to see if there's
actually an easily measurable gain here.  Saving some duplicative
selectivity estimates could be down in the noise ... but it's
surely worth the tiny number of extra tests added here.

Comments?

            regards, tom lane

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index b3c8ce0131..558e94b845 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -526,7 +526,7 @@ EXPLAIN (VERBOSE, COSTS OFF)
          ->  Foreign Scan
                Output: t3.c1
                Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-               Remote SQL: SELECT r3."C 1" FROM ("S 1"."T 1" r2 INNER JOIN "S 1"."T 1" r3 ON (((r2."C 1" = r3."C
1"))))ORDER BY r2."C 1" ASC NULLS LAST 
+               Remote SQL: SELECT r3."C 1" FROM ("S 1"."T 1" r2 INNER JOIN "S 1"."T 1" r3 ON (((r3."C 1" = r2."C
1"))))ORDER BY r2."C 1" ASC NULLS LAST 
          ->  Index Only Scan using t1_pkey on "S 1"."T 1" t1
                Output: t1."C 1"
 (12 rows)
@@ -740,7 +740,7 @@ EXPLAIN (VERBOSE, COSTS OFF)
          Index Cond: (a."C 1" = 47)
    ->  Foreign Scan on public.ft2 b
          Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
+         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = $1::integer))
 (8 rows)

 SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
@@ -763,7 +763,7 @@ EXPLAIN (VERBOSE, COSTS OFF)
          Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((c2 = 6))
    ->  Foreign Scan on public.ft2 b
          Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-         Filter: (upper((a.c7)::text) = (b.c7)::text)
+         Filter: ((b.c7)::text = upper((a.c7)::text))
          Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
 (10 rows)

@@ -1220,7 +1220,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
  Foreign Scan
    Output: t1.c1, t2.c1, t1.c3
    Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
-   Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C
1"))))ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint 
+   Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r1."C
1"))))ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint 
 (4 rows)

 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
@@ -1246,7 +1246,7 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t
  Foreign Scan
    Output: t1.c1, t2.c2, t3.c3, t1.c3
    Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
-   Remote SQL: SELECT r1."C 1", r2.c2, r4.c3, r1.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" =
r2."C1")))) INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1)))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST
LIMIT10::bigint OFFSET 10::bigint 
+   Remote SQL: SELECT r1."C 1", r2.c2, r4.c3, r1.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" =
r1."C1")))) INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1)))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST
LIMIT10::bigint OFFSET 10::bigint 
 (4 rows)

 SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3,
t1.c1OFFSET 10 LIMIT 10; 
@@ -1861,7 +1861,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
  Foreign Scan
    Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
    Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
-   Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3,
r1.c4,r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4,
r2.c5,r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY
r1.c3ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint FOR UPDATE OF r1 
+   Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3,
r1.c4,r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4,
r2.c5,r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r1."C 1")))) ORDER BY
r1.c3ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint FOR UPDATE OF r1 
 (4 rows)

 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF
t1;
@@ -1886,7 +1886,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
  Foreign Scan
    Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
    Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
-   Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3,
r1.c4,r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4,
r2.c5,r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY
r1.c3ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint FOR UPDATE OF r1 FOR UPDATE OF r2 
+   Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3,
r1.c4,r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4,
r2.c5,r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r1."C 1")))) ORDER BY
r1.c3ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint FOR UPDATE OF r1 FOR UPDATE OF r2 
 (4 rows)

 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
@@ -1912,7 +1912,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
  Foreign Scan
    Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
    Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
-   Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3,
r1.c4,r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4,
r2.c5,r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY
r1.c3ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint FOR SHARE OF r1 
+   Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3,
r1.c4,r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4,
r2.c5,r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r1."C 1")))) ORDER BY
r1.c3ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint FOR SHARE OF r1 
 (4 rows)

 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF
t1;
@@ -1937,7 +1937,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
  Foreign Scan
    Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
    Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
-   Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3,
r1.c4,r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4,
r2.c5,r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY
r1.c3ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint FOR SHARE OF r1 FOR SHARE OF r2 
+   Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3,
r1.c4,r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4,
r2.c5,r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r1."C 1")))) ORDER BY
r1.c3ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint FOR SHARE OF r1 FOR SHARE OF r2 
 (4 rows)

 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
@@ -1966,7 +1966,7 @@ WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t
      ->  Foreign Scan
            Output: t1.c1, t1.c3, t2.c1
            Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
-           Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1"
=r2."C 1")))) 
+           Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1"
=r1."C 1")))) 
    ->  Sort
          Output: t.c1_1, t.c2_1, t.c1_3
          Sort Key: t.c1_3, t.c1_1
@@ -1997,7 +1997,7 @@ SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER B
  Foreign Scan
    Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
    Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
-   Remote SQL: SELECT r1.ctid, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5,
r1.c6,r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6,
r2.c7,r2.c8) END, r1."C 1", r1.c3 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY
r1.c3ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint 
+   Remote SQL: SELECT r1.ctid, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5,
r1.c6,r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6,
r2.c7,r2.c8) END, r1."C 1", r1.c3 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r1."C 1")))) ORDER BY
r1.c3ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint 
 (4 rows)

 -- SEMI JOIN, not pushed down
@@ -2216,7 +2216,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = t2.
                Output: t1.c1, t2.c1, t1.c3
                Filter: (t1.c8 = t2.c8)
                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
-               Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, r1.c8, r2.c8 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1"
r2ON (((r1."C 1" = r2."C 1")))) 
+               Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, r1.c8, r2.c8 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1"
r2ON (((r2."C 1" = r1."C 1")))) 
 (10 rows)

 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = t2.c8 ORDER BY t1.c3, t1.c1 OFFSET 100
LIMIT10; 
@@ -2254,11 +2254,11 @@ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2
                            ->  Foreign Scan
                                  Output: t1.c1, t2.c1
                                  Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
-                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2
ON(((r1."C 1" = r2."C 1")))) 
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2
ON(((r2."C 1" = r1."C 1")))) 
                            ->  Foreign Scan
                                  Output: t1_1.c1, t2_1.c1
                                  Relations: (public.ft1 t1_1) INNER JOIN (public.ft2 t2_1)
-                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2
ON(((r1."C 1" = r2."C 1")))) 
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2
ON(((r2."C 1" = r1."C 1")))) 
 (20 rows)

 SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) UNION SELECT t1.c1,
t2.c1FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) AS t (t1c1, t2c1) GROUP BY t1c1 ORDER BY t1c1 OFFSET 100 LIMIT 10; 
@@ -2297,7 +2297,7 @@ SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM
                            ->  Foreign Scan
                                  Output: t2.c1, t3.c1
                                  Relations: (public.ft1 t2) INNER JOIN (public.ft2 t3)
-                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2
ON(((r1."C 1" = r2."C 1")) AND ((r1.c2 = $1::integer)))) 
+                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2
ON(((r2."C 1" = r1."C 1")) AND ((r1.c2 = $1::integer)))) 
 (17 rows)

 SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM ft1 t2, ft2 t3 WHERE t2.c1 = t3.c1 AND
t2.c2= t1.c2) q ORDER BY t1."C 1" OFFSET 10 LIMIT 10; 
@@ -2415,7 +2415,7 @@ SELECT * FROM ft1, ft2, ft4, ft5, local_tbl WHERE ft1.c1 = ft2.c1 AND ft1.c2 = f
          ->  Foreign Scan
                Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3,
ft2.c4,ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*, ft4.c1, ft4.c2, ft4.c3, ft4.*, ft5.c1, ft5.c2, ft5.c3, ft5.* 
                Relations: (((public.ft1) INNER JOIN (public.ft2)) INNER JOIN (public.ft4)) INNER JOIN (public.ft5)
-               Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, CASE WHEN (r1.*)::text IS
NOTNULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5,
r2.c6,r2.c7, r2.c8, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7,
r2.c8)END, r3.c1, r3.c2, r3.c3, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r4.c1, r4.c2,
r4.c3,CASE WHEN (r4.*)::text IS NOT NULL THEN ROW(r4.c1, r4.c2, r4.c3) END FROM ((("S 1"."T 1" r1 INNER JOIN "S 1"."T
1"r2 ON (((r1."C 1" = r2."C 1")) AND ((r2."C 1" < 100)) AND ((r1."C 1" < 100)))) INNER JOIN "S 1"."T 3" r3 ON (((r1.c2
=r3.c1)))) INNER JOIN "S 1"."T 4" r4 ON (((r1.c2 = r4.c1)))) ORDER BY r1.c2 ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE
OFr2 FOR UPDATE OF r3 FOR UPDATE OF r4 
+               Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, CASE WHEN (r1.*)::text IS
NOTNULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5,
r2.c6,r2.c7, r2.c8, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7,
r2.c8)END, r3.c1, r3.c2, r3.c3, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r4.c1, r4.c2,
r4.c3,CASE WHEN (r4.*)::text IS NOT NULL THEN ROW(r4.c1, r4.c2, r4.c3) END FROM ((("S 1"."T 1" r1 INNER JOIN "S 1"."T
1"r2 ON (((r2."C 1" = r1."C 1")) AND ((r2."C 1" < 100)) AND ((r1."C 1" < 100)))) INNER JOIN "S 1"."T 3" r3 ON (((r1.c2
=r3.c1)))) INNER JOIN "S 1"."T 4" r4 ON (((r1.c2 = r4.c1)))) ORDER BY r1.c2 ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE
OFr2 FOR UPDATE OF r3 FOR UPDATE OF r4 
                ->  Merge Join
                      Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2,
ft2.c3,ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*, ft4.c1, ft4.c2, ft4.c3, ft4.*, ft5.c1, ft5.c2, ft5.c3, ft5.* 
                      Merge Cond: (ft1.c2 = ft5.c1)
@@ -2491,7 +2491,7 @@ SELECT * FROM local_tbl LEFT JOIN (SELECT ft1.*, COALESCE(ft1.c3 || ft2.c3, 'foo
                ->  Foreign Scan
                      Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.*,
COALESCE((ft1.c3|| ft2.c3), 'foobar'::text) 
                      Relations: (public.ft1) INNER JOIN (public.ft2)
-                     Remote SQL: SELECT r4."C 1", r4.c2, r4.c3, r4.c4, r4.c5, r4.c6, r4.c7, r4.c8, CASE WHEN
(r4.*)::textIS NOT NULL THEN ROW(r4."C 1", r4.c2, r4.c3, r4.c4, r4.c5, r4.c6, r4.c7, r4.c8) END, CASE WHEN (r5.*)::text
ISNOT NULL THEN ROW(r5."C 1", r5.c2, r5.c3, r5.c4, r5.c5, r5.c6, r5.c7, r5.c8) END, r5.c3 FROM ("S 1"."T 1" r4 INNER
JOIN"S 1"."T 1" r5 ON (((r4."C 1" = r5."C 1")) AND ((r4."C 1" < 100)))) ORDER BY r4."C 1" ASC NULLS LAST 
+                     Remote SQL: SELECT r4."C 1", r4.c2, r4.c3, r4.c4, r4.c5, r4.c6, r4.c7, r4.c8, CASE WHEN
(r4.*)::textIS NOT NULL THEN ROW(r4."C 1", r4.c2, r4.c3, r4.c4, r4.c5, r4.c6, r4.c7, r4.c8) END, CASE WHEN (r5.*)::text
ISNOT NULL THEN ROW(r5."C 1", r5.c2, r5.c3, r5.c4, r5.c5, r5.c6, r5.c7, r5.c8) END, r5.c3 FROM ("S 1"."T 1" r4 INNER
JOIN"S 1"."T 1" r5 ON (((r5."C 1" = r4."C 1")) AND ((r4."C 1" < 100)))) ORDER BY r4."C 1" ASC NULLS LAST 
                      ->  Result
                            Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.*,
ft2.c3
                            ->  Sort
@@ -2529,7 +2529,7 @@ SELECT * FROM local_tbl LEFT JOIN (SELECT ft1.* FROM ft1 INNER JOIN ft2 ON (ft1.
                      Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.*
                      Filter: (ft1.c1 = postgres_fdw_abs(ft2.c2))
                      Relations: (public.ft1) INNER JOIN (public.ft2)
-                     Remote SQL: SELECT r4."C 1", r4.c2, r4.c3, r4.c4, r4.c5, r4.c6, r4.c7, r4.c8, CASE WHEN
(r4.*)::textIS NOT NULL THEN ROW(r4."C 1", r4.c2, r4.c3, r4.c4, r4.c5, r4.c6, r4.c7, r4.c8) END, CASE WHEN (r5.*)::text
ISNOT NULL THEN ROW(r5."C 1", r5.c2, r5.c3, r5.c4, r5.c5, r5.c6, r5.c7, r5.c8) END, r5.c2 FROM ("S 1"."T 1" r4 INNER
JOIN"S 1"."T 1" r5 ON (((r4."C 1" = r5."C 1")) AND ((r4."C 1" < 100)))) ORDER BY r4.c3 ASC NULLS LAST 
+                     Remote SQL: SELECT r4."C 1", r4.c2, r4.c3, r4.c4, r4.c5, r4.c6, r4.c7, r4.c8, CASE WHEN
(r4.*)::textIS NOT NULL THEN ROW(r4."C 1", r4.c2, r4.c3, r4.c4, r4.c5, r4.c6, r4.c7, r4.c8) END, CASE WHEN (r5.*)::text
ISNOT NULL THEN ROW(r5."C 1", r5.c2, r5.c3, r5.c4, r5.c5, r5.c6, r5.c7, r5.c8) END, r5.c2 FROM ("S 1"."T 1" r4 INNER
JOIN"S 1"."T 1" r5 ON (((r5."C 1" = r4."C 1")) AND ((r4."C 1" < 100)))) ORDER BY r4.c3 ASC NULLS LAST 
                      ->  Sort
                            Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.*,
ft2.c2
                            Sort Key: ft1.c3
@@ -2771,7 +2771,7 @@ select sum(t1.c1), count(t2.c1) from ft1 t1 inner join ft2 t2 on (t1.c1 = t2.c1)
          Output: t1.c1, t2.c1
          Filter: (((((t1.c1 * t2.c1) / (t1.c1 * t2.c1)))::double precision * random()) <= '1'::double precision)
          Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
-         Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C
1"))))
+         Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r1."C
1"))))
 (7 rows)

 -- GROUP BY clause having expressions
@@ -3973,7 +3973,7 @@ EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st2(10, 20);
    Sort Key: t1.c1
    ->  Nested Loop Semi Join
          Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         Join Filter: (t1.c3 = t2.c3)
+         Join Filter: (t2.c3 = t1.c3)
          ->  Foreign Scan on public.ft1 t1
                Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
                Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20))
@@ -4007,7 +4007,7 @@ EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st3(10, 20);
    Sort Key: t1.c1
    ->  Nested Loop Semi Join
          Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         Join Filter: (t1.c3 = t2.c3)
+         Join Filter: (t2.c3 = t1.c3)
          ->  Foreign Scan on public.ft1 t1
                Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
                Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20))
@@ -4502,7 +4502,7 @@ explain (verbose, costs off) select * from ft3 f, loct3 l
          Index Cond: (l.f1 = 'foo'::text)
    ->  Foreign Scan on public.ft3 f
          Output: f.f1, f.f2, f.f3
-         Remote SQL: SELECT f1, f2, f3 FROM public.loct3 WHERE (($1::character varying(10) = f3))
+         Remote SQL: SELECT f1, f2, f3 FROM public.loct3 WHERE ((f3 = $1::character varying(10)))
 (8 rows)

 -- can't be sent to remote
@@ -10899,13 +10899,13 @@ SELECT * FROM local_tbl, async_pt WHERE local_tbl.a = async_pt.a AND local_tbl.c
    ->  Append
          ->  Async Foreign Scan on public.async_p1 async_pt_1
                Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
-               Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (($1::integer = a))
+               Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE ((a = $1::integer))
          ->  Async Foreign Scan on public.async_p2 async_pt_2
                Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
-               Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (($1::integer = a))
+               Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE ((a = $1::integer))
          ->  Seq Scan on public.async_p3 async_pt_3
                Output: async_pt_3.a, async_pt_3.b, async_pt_3.c
-               Filter: (local_tbl.a = async_pt_3.a)
+               Filter: (async_pt_3.a = local_tbl.a)
 (15 rows)

 EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
@@ -10920,7 +10920,7 @@ SELECT * FROM local_tbl, async_pt WHERE local_tbl.a = async_pt.a AND local_tbl.c
          ->  Async Foreign Scan on async_p1 async_pt_1 (never executed)
          ->  Async Foreign Scan on async_p2 async_pt_2 (actual rows=1 loops=1)
          ->  Seq Scan on async_p3 async_pt_3 (never executed)
-               Filter: (local_tbl.a = a)
+               Filter: (a = local_tbl.a)
 (9 rows)

 SELECT * FROM local_tbl, async_pt WHERE local_tbl.a = async_pt.a AND local_tbl.c = 'bar';
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index f962ff82ad..0544335249 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1382,7 +1382,9 @@ generate_base_implied_equalities_broken(PlannerInfo *root,
  * whenever we select a particular pair of EquivalenceMembers to join,
  * we check to see if the pair matches any original clause (in ec_sources)
  * or previously-built clause (in ec_derives).  This saves memory and allows
- * re-use of information cached in RestrictInfos.
+ * re-use of information cached in RestrictInfos.  We also avoid generating
+ * commutative duplicates, i.e. if the algorithm selects "a.x = b.y" but
+ * we already have "b.y = a.x", we return the existing clause.
  *
  * join_relids should always equal bms_union(outer_relids, inner_rel->relids).
  * We could simplify this function's API by computing it internally, but in
@@ -1811,16 +1813,22 @@ create_join_clause(PlannerInfo *root,
     /*
      * Search to see if we already built a RestrictInfo for this pair of
      * EquivalenceMembers.  We can use either original source clauses or
-     * previously-derived clauses.  The check on opno is probably redundant,
-     * but be safe ...
+     * previously-derived clauses, and a commutator clause is acceptable.
+     *
+     * We used to verify that opno matches, but that seems redundant: even if
+     * it's not identical, it'd better have the same effects, or the operator
+     * families we're using are broken.
      */
     foreach(lc, ec->ec_sources)
     {
         rinfo = (RestrictInfo *) lfirst(lc);
         if (rinfo->left_em == leftem &&
             rinfo->right_em == rightem &&
-            rinfo->parent_ec == parent_ec &&
-            opno == ((OpExpr *) rinfo->clause)->opno)
+            rinfo->parent_ec == parent_ec)
+            return rinfo;
+        if (rinfo->left_em == rightem &&
+            rinfo->right_em == leftem &&
+            rinfo->parent_ec == parent_ec)
             return rinfo;
     }

@@ -1829,8 +1837,11 @@ create_join_clause(PlannerInfo *root,
         rinfo = (RestrictInfo *) lfirst(lc);
         if (rinfo->left_em == leftem &&
             rinfo->right_em == rightem &&
-            rinfo->parent_ec == parent_ec &&
-            opno == ((OpExpr *) rinfo->clause)->opno)
+            rinfo->parent_ec == parent_ec)
+            return rinfo;
+        if (rinfo->left_em == rightem &&
+            rinfo->right_em == leftem &&
+            rinfo->parent_ec == parent_ec)
             return rinfo;
     }

diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 08334761ae..9b69a8c122 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3149,7 +3149,7 @@ where i41.f1 > 0;
          ->  Seq Scan on int4_tbl i41
                Filter: (f1 > 0)
          ->  Nested Loop
-               Join Filter: (i41.f1 = i42.f1)
+               Join Filter: (i42.f1 = i41.f1)
                ->  Seq Scan on int8_tbl i81
                ->  Materialize
                      ->  Seq Scan on int4_tbl i42
@@ -4871,7 +4871,7 @@ where ss.stringu2 !~* ss.case1;
                                          QUERY PLAN
 --------------------------------------------------------------------------------------------
  Nested Loop
-   Join Filter: (CASE t1.ten WHEN 0 THEN 'doh!'::text ELSE NULL::text END = t0.f1)
+   Join Filter: (t0.f1 = CASE t1.ten WHEN 0 THEN 'doh!'::text ELSE NULL::text END)
    ->  Nested Loop
          ->  Seq Scan on int4_tbl i4
          ->  Index Scan using tenk1_unique2 on tenk1 t1
@@ -6533,7 +6533,7 @@ where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1;
 -----------------------------------------
  Merge Join
    Merge Cond: (j1.id1 = j2.id1)
-   Join Filter: (j1.id2 = j2.id2)
+   Join Filter: (j2.id2 = j1.id2)
    ->  Index Scan using j1_id1_idx on j1
    ->  Index Scan using j2_id1_idx on j2
 (5 rows)
@@ -6555,7 +6555,7 @@ where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1 and j2.id1 = any (array[1]);
 ----------------------------------------------------
  Merge Join
    Merge Cond: (j1.id1 = j2.id1)
-   Join Filter: (j1.id2 = j2.id2)
+   Join Filter: (j2.id2 = j1.id2)
    ->  Index Scan using j1_id1_idx on j1
    ->  Index Scan using j2_id1_idx on j2
          Index Cond: (id1 = ANY ('{1}'::integer[]))
@@ -6578,7 +6578,7 @@ where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1 and j2.id1 >= any (array[1,5]);
 -------------------------------------------------------
  Merge Join
    Merge Cond: (j1.id1 = j2.id1)
-   Join Filter: (j1.id2 = j2.id2)
+   Join Filter: (j2.id2 = j1.id2)
    ->  Index Scan using j1_id1_idx on j1
    ->  Index Only Scan using j2_pkey on j2
          Index Cond: (id1 >= ANY ('{1,5}'::integer[]))
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index bb5b7c47a4..b20facc19f 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -304,7 +304,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t2.b FROM prt2 t2 WHERE t2.a = 0)
                      ->  Seq Scan on prt2_p2 t2_2
                            Filter: (a = 0)
          ->  Nested Loop Semi Join
-               Join Filter: (t1_3.a = t2_3.b)
+               Join Filter: (t2_3.b = t1_3.a)
                ->  Seq Scan on prt1_p3 t1_3
                      Filter: (b = 0)
                ->  Materialize
@@ -601,7 +601,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t
    Sort Key: t1.a
    ->  Append
          ->  Nested Loop
-               Join Filter: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
+               Join Filter: (((t3_1.a + t3_1.b) / 2) = t1_1.a)
                ->  Hash Join
                      Hash Cond: (t2_1.b = t1_1.a)
                      ->  Seq Scan on prt2_p1 t2_1
@@ -611,7 +611,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t
                ->  Index Scan using iprt1_e_p1_ab2 on prt1_e_p1 t3_1
                      Index Cond: (((a + b) / 2) = t2_1.b)
          ->  Nested Loop
-               Join Filter: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
+               Join Filter: (((t3_2.a + t3_2.b) / 2) = t1_2.a)
                ->  Hash Join
                      Hash Cond: (t2_2.b = t1_2.a)
                      ->  Seq Scan on prt2_p2 t2_2
@@ -621,7 +621,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t
                ->  Index Scan using iprt1_e_p2_ab2 on prt1_e_p2 t3_2
                      Index Cond: (((a + b) / 2) = t2_2.b)
          ->  Nested Loop
-               Join Filter: (t1_3.a = ((t3_3.a + t3_3.b) / 2))
+               Join Filter: (((t3_3.a + t3_3.b) / 2) = t1_3.a)
                ->  Hash Join
                      Hash Cond: (t2_3.b = t1_3.a)
                      ->  Seq Scan on prt2_p3 t2_3
@@ -926,7 +926,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHER
    Sort Key: t1.a
    ->  Append
          ->  Nested Loop
-               Join Filter: (t1_2.a = t1_5.b)
+               Join Filter: (t1_5.b = t1_2.a)
                ->  HashAggregate
                      Group Key: t1_5.b
                      ->  Hash Join
@@ -939,7 +939,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHER
                      Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
                      Filter: (b = 0)
          ->  Nested Loop
-               Join Filter: (t1_3.a = t1_6.b)
+               Join Filter: (t1_6.b = t1_3.a)
                ->  HashAggregate
                      Group Key: t1_6.b
                      ->  Hash Join
@@ -952,7 +952,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHER
                      Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
                      Filter: (b = 0)
          ->  Nested Loop
-               Join Filter: (t1_4.a = t1_7.b)
+               Join Filter: (t1_7.b = t1_4.a)
                ->  HashAggregate
                      Group Key: t1_7.b
                      ->  Nested Loop
diff --git a/src/test/regress/expected/tidscan.out b/src/test/regress/expected/tidscan.out
index 13c3c360c2..f133b5a4ac 100644
--- a/src/test/regress/expected/tidscan.out
+++ b/src/test/regress/expected/tidscan.out
@@ -119,7 +119,7 @@ FROM tidscan t1 JOIN tidscan t2 ON t1.ctid = t2.ctid WHERE t1.id = 1;
    ->  Seq Scan on tidscan t1
          Filter: (id = 1)
    ->  Tid Scan on tidscan t2
-         TID Cond: (ctid = t1.ctid)
+         TID Cond: (t1.ctid = ctid)
 (5 rows)

 SELECT t1.ctid, t1.*, t2.ctid, t2.*

Re: Reducing duplicativeness of EquivalenceClass-derived clauses

From
Richard Guo
Date:

On Wed, Oct 26, 2022 at 6:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
While fooling with my longstanding outer-join variables changes
(I am making progress on that, honest), I happened to notice that
equivclass.c is leaving some money on the table by generating
redundant RestrictInfo clauses.  It already attempts to not generate
the same clause twice, which can save a nontrivial amount of work
because we cache selectivity estimates and so on per-RestrictInfo.
I realized though that it will build and return a clause like
"a.x = b.y" even if it already has "b.y = a.x".  This is just
wasteful.  It's always been the case that equivclass.c will
produce clauses that are ordered according to its own whims.
Consumers that need the operands in a specific order, such as
index scans or hash joins, are required to commute the clause
to be the way they want it while building the finished plan.
Therefore, it shouldn't matter which order of the operands we
return, and giving back the commutator clause if available could
potentially save as much as half of the selectivity-estimation
work we do with these clauses subsequently.

Hence, PFA a patch that adjusts create_join_clause() to notice
commuted as well as exact matches among the EquivalenceClass's
existing clauses.  This results in a number of changes visible in
regression test cases, but they're all clearly inconsequential.
 
I think there is no problem with this idea, given the operands of
EC-derived clauses are commutative, and it seems no one would actually
rely on the order of the operands.  I can see hashjoin/mergejoin would
commute hash/merge joinclauses if needed with get_switched_clauses().
 

The only thing that I think might be controversial here is that
I dropped the check for matching operator OID.  To preserve that,
we'd have needed to use get_commutator() in the reverse-match cases,
which it seemed to me would be a completely unjustified expenditure
of cycles.  The operators we select for freshly-generated clauses
will certainly always match those of previously-generated clauses.
Maybe there's a chance that they'd not match those of ec_sources
clauses (that is, the user-written clauses we started from), but
if they don't and that actually makes any difference then surely
we are talking about a buggy opclass definition.
 
The operator is chosen according to the two given EC members's data
type.  Since we are dealing with the same pair of EC members, I think
the operator is always the same one. So it also seems no problem to drop
the check for operator. I wonder if we can even add an assertion if
we've found a RestrictInfo from ec_derives that the operator matches.

Thanks
Richard

Re: Reducing duplicativeness of EquivalenceClass-derived clauses

From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes:
> On Wed, Oct 26, 2022 at 6:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The only thing that I think might be controversial here is that
>> I dropped the check for matching operator OID.  To preserve that,
>> we'd have needed to use get_commutator() in the reverse-match cases,
>> which it seemed to me would be a completely unjustified expenditure
>> of cycles.  The operators we select for freshly-generated clauses
>> will certainly always match those of previously-generated clauses.
>> Maybe there's a chance that they'd not match those of ec_sources
>> clauses (that is, the user-written clauses we started from), but
>> if they don't and that actually makes any difference then surely
>> we are talking about a buggy opclass definition.

> The operator is chosen according to the two given EC members's data
> type.  Since we are dealing with the same pair of EC members, I think
> the operator is always the same one. So it also seems no problem to drop
> the check for operator. I wonder if we can even add an assertion if
> we've found a RestrictInfo from ec_derives that the operator matches.

Yeah, I considered that --- even if somehow an ec_sources entry isn't
an exact match, ec_derives ought to be.  However, it still didn't seem
worth a get_commutator() call.  We'd basically be expending cycles to
check that select_equality_operator yields the same result with the same
inputs as it did before, and that doesn't seem terribly interesting to
check.  I'm also not sure what's the point of allowing divergence
from the requested operator in some but not all paths.

I added a bit of instrumentation to count how many times we need to build
new join clauses in create_join_clause.  In the current core regression
tests, I see this change reducing the number of new join clauses built
here from 9673 to 5142 (out of 26652 calls).  So not quite 50% savings,
but pretty close to it.  That should mean that this change is about
a wash just in terms of the code it touches directly: each iteration
of the search loops is nearly twice as expensive as before, but we'll
only need to do about half as many.  So whatever we save downstream
is pure gravy.

            regards, tom lane



Re: Reducing duplicativeness of EquivalenceClass-derived clauses

From
Zhang Mingli
Date:
HI,

On Oct 26, 2022, 06:09 +0800, Tom Lane <tgl@sss.pgh.pa.us>, wrote:
While fooling with my longstanding outer-join variables changes
(I am making progress on that, honest), I happened to notice that
equivclass.c is leaving some money on the table by generating
redundant RestrictInfo clauses. It already attempts to not generate
the same clause twice, which can save a nontrivial amount of work
because we cache selectivity estimates and so on per-RestrictInfo.
I realized though that it will build and return a clause like
"a.x = b.y" even if it already has "b.y = a.x". This is just
wasteful. It's always been the case that equivclass.c will
produce clauses that are ordered according to its own whims.
Consumers that need the operands in a specific order, such as
index scans or hash joins, are required to commute the clause
to be the way they want it while building the finished plan.
Therefore, it shouldn't matter which order of the operands we
return, and giving back the commutator clause if available could
potentially save as much as half of the selectivity-estimation
work we do with these clauses subsequently.

Hence, PFA a patch that adjusts create_join_clause() to notice
commuted as well as exact matches among the EquivalenceClass's
existing clauses. This results in a number of changes visible in
regression test cases, but they're all clearly inconsequential.

The only thing that I think might be controversial here is that
I dropped the check for matching operator OID. To preserve that,
we'd have needed to use get_commutator() in the reverse-match cases,
which it seemed to me would be a completely unjustified expenditure
of cycles. The operators we select for freshly-generated clauses
will certainly always match those of previously-generated clauses.
Maybe there's a chance that they'd not match those of ec_sources
clauses (that is, the user-written clauses we started from), but
if they don't and that actually makes any difference then surely
we are talking about a buggy opclass definition.

I've not bothered to make any performance tests to see if there's
actually an easily measurable gain here. Saving some duplicative
selectivity estimates could be down in the noise ... but it's
surely worth the tiny number of extra tests added here.

Comments?

regards, tom lane

Make sense.

How about combine ec->ec_sources and ec->derives as one list for less codes? 

```
foreach(lc, list_union(ec->ec_sources, ec->ec_derives))
{
 rinfo = (RestrictInfo *) lfirst(lc);
 if (rinfo->left_em == leftem &&
 rinfo->right_em == rightem &&
 rinfo->parent_ec == parent_ec)
 return rinfo;
 if (rinfo->left_em == rightem &&
 rinfo->right_em == leftem &&
 rinfo->parent_ec == parent_ec)
 return rinfo;
}
```
I have a try, it will change some in join.out and avoid changes in tidscan.out.

Regards,
Zhang Mingli

Re: Reducing duplicativeness of EquivalenceClass-derived clauses

From
Tom Lane
Date:
Zhang Mingli <zmlpostgres@gmail.com> writes:
> How about combine ec->ec_sources and ec->derives as one list for less codes?

Keeping them separate is required for the broken-EC code paths.
Even if it weren't, I wouldn't merge them just to save a couple
of lines of code --- I think it's useful to be able to tell which
clauses the EC started from.

            regards, tom lane



Re: Reducing duplicativeness of EquivalenceClass-derived clauses

From
Zhang Mingli
Date:
Hi,

On Oct 27, 2022, 21:29 +0800, Tom Lane <tgl@sss.pgh.pa.us>, wrote:
Zhang Mingli <zmlpostgres@gmail.com> writes:
How about combine ec->ec_sources and ec->derives as one list for less codes?

Keeping them separate is required for the broken-EC code paths.
Even if it weren't, I wouldn't merge them just to save a couple
of lines of code --- I think it's useful to be able to tell which
clauses the EC started from.

regards, tom lane
Got it, thanks.

Regards,
Zhang Mingli