postgres_fdw planning issue: EquivalenceClass changes confuse it - Mailing list pgsql-hackers

From Tom Lane
Subject postgres_fdw planning issue: EquivalenceClass changes confuse it
Date
Msg-id 1691374.1671659838@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
I discovered that if you do this:

diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2e6f7f4852..2ae231fd90 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -366,7 +366,7 @@ CREATE FUNCTION postgres_fdw_abs(int) RETURNS int AS $$
 BEGIN
 RETURN abs($1);
 END
-$$ LANGUAGE plpgsql IMMUTABLE;
+$$ LANGUAGE plpgsql IMMUTABLE STRICT;
 CREATE OPERATOR === (
     LEFTARG = int,
     RIGHTARG = int,

one of the plan changes that you get (attached) is that this query:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM local_tbl LEFT JOIN (SELECT ft1.* FROM ft1 INNER JOIN ft2 ON (ft1.c1 = ft2.c1 AND ft1.c1 < 100 AND ft1.c1
=postgres_fdw_abs(ft2.c2))) ss ON (local_tbl.c3 = ss.c3) ORDER BY local_tbl.c1 FOR UPDATE OF local_tbl; 

can no longer do the join as a foreign join.  There are some other
changes that are more explicable, because strictness of the function
allows the planner to strength-reduce full joins to left joins, but
what happened here?

The answer is that once postgres_fdw_abs() is marked strict,
the EquivalenceClass machinery will group these clauses as an
EquivalenceClass:

    ft1.c1 = ft2.c1 AND ft1.c1 = postgres_fdw_abs(ft2.c2)

which it will then choose to implement as a restriction clause
on ft2
    ft2.c1 = postgres_fdw_abs(ft2.c2)
followed by a join clause
    ft1.c1 = ft2.c1
This is a good and useful transformation, because it can get rid
of ft2 rows at the scan level instead of waiting for them to be
joined.  However, because we are treating postgres_fdw_abs() as
non-shippable in this particular test case, that means that ft2
now has a non-shippable restriction clause, causing foreign_join_ok
to give up here:

    /*
     * If joining relations have local conditions, those conditions are
     * required to be applied before joining the relations. Hence the join can
     * not be pushed down.
     */
    if (fpinfo_o->local_conds || fpinfo_i->local_conds)
        return false;

In the other formulation, "ft1.c1 = postgres_fdw_abs(ft2.c2)"
is a non-shippable join clause, which foreign_join_ok knows how
to cope with.  So this seems like a fairly nasty asymmetry.

I ran into this while experimenting with the next phase in my
outer-join-vars patch set, in which the restriction that
below-outer-join Equivalence classes contain only strict members
will go away.  So that breaks this test, and I need to either
fix postgres_fdw or change the test case.

I experimented with teaching foreign_join_ok to pull up the child rels'
local_conds to be join local_conds if the join is an inner join,
which seems like a legal transformation.  I ran into a couple of
issues though, the hardest of which to solve is that in DML queries
we get "variable not found in subplan target lists" failures while
trying to build some EPQ queries.  That's because the pulled-up
condition uses a variable that we didn't think we'd need at the join
level.  That could possibly be fixed by handling these conditions
differently for the transmitted query than the EPQ query, but I'm
not sufficiently familiar with the postgres_fdw code to want to
take point on coding that.  In any case, this line of thought
would lead to several other plan changes in the postgres_fdw
regression tests, and I'm not sure if any of those would be
breaking the intent of the test cases.

Or I could just hack this one test so that it continues to
not be an EquivalenceClass case.

Thoughts?

            regards, tom lane

diff -U3 /home/postgres/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out
/home/postgres/pgsql/contrib/postgres_fdw/results/postgres_fdw.out
--- /home/postgres/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out    2022-12-21 16:28:13.601965240 -0500
+++ /home/postgres/pgsql/contrib/postgres_fdw/results/postgres_fdw.out    2022-12-21 16:30:20.152135336 -0500
@@ -923,7 +923,7 @@
 BEGIN
 RETURN abs($1);
 END
-$$ LANGUAGE plpgsql IMMUTABLE;
+$$ LANGUAGE plpgsql IMMUTABLE STRICT;
 CREATE OPERATOR === (
     LEFTARG = int,
     RIGHTARG = int,
@@ -1828,28 +1828,44 @@
 -- full outer join + WHERE clause with shippable extensions set
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t1.c3 FROM ft1 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE postgres_fdw_abs(t1.c1) > 0 OFFSET 10
LIMIT10; 
-                                                                                                 QUERY PLAN
                                                                                     

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Foreign Scan
+                                                QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Limit
    Output: t1.c1, t2.c2, t1.c3
-   Relations: (public.ft1 t1) FULL JOIN (public.ft2 t2)
-   Remote SQL: SELECT r1."C 1", r2.c2, r1.c3 FROM ("S 1"."T 1" r1 FULL JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C
1"))))WHERE ((public.postgres_fdw_abs(r1."C 1") > 0)) LIMIT 10::bigint OFFSET 10::bigint 
-(4 rows)
+   ->  Hash Left Join
+         Output: t1.c1, t2.c2, t1.c3
+         Hash Cond: (t1.c1 = t2.c1)
+         ->  Foreign Scan on public.ft1 t1
+               Output: t1.c1, t1.c3
+               Remote SQL: SELECT "C 1", c3 FROM "S 1"."T 1" WHERE ((public.postgres_fdw_abs("C 1") > 0))
+         ->  Hash
+               Output: t2.c2, t2.c1
+               ->  Foreign Scan on public.ft2 t2
+                     Output: t2.c2, t2.c1
+                     Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1"
+(13 rows)

 ALTER SERVER loopback OPTIONS (DROP extensions);
 -- full outer join + WHERE clause with shippable extensions not set
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t1.c3 FROM ft1 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE postgres_fdw_abs(t1.c1) > 0 OFFSET 10
LIMIT10; 
-                                                          QUERY PLAN
        

--------------------------------------------------------------------------------------------------------------------------------
+                            QUERY PLAN
+-------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c2, t1.c3
-   ->  Foreign Scan
+   ->  Hash Left Join
          Output: t1.c1, t2.c2, t1.c3
-         Filter: (postgres_fdw_abs(t1.c1) > 0)
-         Relations: (public.ft1 t1) FULL JOIN (public.ft2 t2)
-         Remote SQL: SELECT r1."C 1", r2.c2, r1.c3 FROM ("S 1"."T 1" r1 FULL JOIN "S 1"."T 1" r2 ON (((r1."C 1" =
r2."C1")))) 
-(7 rows)
+         Hash Cond: (t1.c1 = t2.c1)
+         ->  Foreign Scan on public.ft1 t1
+               Output: t1.c1, t1.c3
+               Filter: (postgres_fdw_abs(t1.c1) > 0)
+               Remote SQL: SELECT "C 1", c3 FROM "S 1"."T 1"
+         ->  Hash
+               Output: t2.c2, t2.c1
+               ->  Foreign Scan on public.ft2 t2
+                     Output: t2.c2, t2.c1
+                     Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1"
+(14 rows)

 ALTER SERVER loopback OPTIONS (ADD extensions 'postgres_fdw');
 -- join two tables with FOR UPDATE clause
@@ -2514,8 +2530,8 @@
 ALTER SERVER loopback OPTIONS (ADD fdw_startup_cost '10000.0');
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT * FROM local_tbl LEFT JOIN (SELECT ft1.* FROM ft1 INNER JOIN ft2 ON (ft1.c1 = ft2.c1 AND ft1.c1 < 100 AND
ft1.c1= postgres_fdw_abs(ft2.c2))) ss ON (local_tbl.c3 = ss.c3) ORDER BY local_tbl.c1 FOR UPDATE OF local_tbl; 
-
                                                                                                    QUERY PLAN

                                                                                          

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                       QUERY PLAN
                                 

+--------------------------------------------------------------------------------------------------------------------------------------------------------
  LockRows
    Output: local_tbl.c1, local_tbl.c2, local_tbl.c3, ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8,
local_tbl.ctid,ft1.*, ft2.* 
    ->  Nested Loop Left Join
@@ -2525,30 +2541,19 @@
                Output: local_tbl.c1, local_tbl.c2, local_tbl.c3, local_tbl.ctid
          ->  Materialize
                Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.*
-               ->  Foreign Scan
+               ->  Hash Join
                      Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.*
-                     Filter: (ft1.c1 = postgres_fdw_abs(ft2.c2))
-                     Relations: (public.ft1) INNER JOIN (public.ft2)
-                     Remote SQL: SELECT r4."C 1", r4.c2, r4.c3, r4.c4, r4.c5, r4.c6, r4.c7, r4.c8, CASE WHEN
(r4.*)::textIS NOT NULL THEN ROW(r4."C 1", r4.c2, r4.c3, r4.c4, r4.c5, r4.c6, r4.c7, r4.c8) END, CASE WHEN (r5.*)::text
ISNOT NULL THEN ROW(r5."C 1", r5.c2, r5.c3, r5.c4, r5.c5, r5.c6, r5.c7, r5.c8) END, r5.c2 FROM ("S 1"."T 1" r4 INNER
JOIN"S 1"."T 1" r5 ON (((r5."C 1" = r4."C 1")) AND ((r4."C 1" < 100)))) ORDER BY r4.c3 ASC NULLS LAST 
-                     ->  Sort
-                           Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.*,
ft2.c2
-                           Sort Key: ft1.c3
-                           ->  Merge Join
-                                 Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.*,
ft2.c2
-                                 Merge Cond: ((ft1.c1 = (postgres_fdw_abs(ft2.c2))) AND (ft1.c1 = ft2.c1))
-                                 ->  Sort
-                                       Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*
-                                       Sort Key: ft1.c1
-                                       ->  Foreign Scan on public.ft1
-                                             Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8,
ft1.*
-                                             Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
WHERE(("C 1" < 100)) 
-                                 ->  Sort
-                                       Output: ft2.*, ft2.c1, ft2.c2, (postgres_fdw_abs(ft2.c2))
-                                       Sort Key: (postgres_fdw_abs(ft2.c2)), ft2.c1
-                                       ->  Foreign Scan on public.ft2
-                                             Output: ft2.*, ft2.c1, ft2.c2, postgres_fdw_abs(ft2.c2)
-                                             Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
ORDERBY "C 1" ASC NULLS LAST 
-(32 rows)
+                     Hash Cond: (ft2.c1 = ft1.c1)
+                     ->  Foreign Scan on public.ft2
+                           Output: ft2.*, ft2.c1, ft2.c2
+                           Filter: (ft2.c1 = postgres_fdw_abs(ft2.c2))
+                           Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" ORDER BY "C 1" ASC
NULLSLAST 
+                     ->  Hash
+                           Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*
+                           ->  Foreign Scan on public.ft1
+                                 Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*
+                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" <
100))
+(21 rows)

 ALTER SERVER loopback OPTIONS (DROP fdw_startup_cost);
 ALTER SERVER loopback OPTIONS (ADD extensions 'postgres_fdw');

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures
Next
From: Andrew Dunstan
Date:
Subject: Re: float4in_internal