Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails - Mailing list pgsql-hackers

From Tom Lane
Subject Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Date
Msg-id 2201880.1630528969@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Responses Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
List pgsql-hackers
Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> On Thu, Jul 15, 2021 at 4:17 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
>> On Wed, Jul 14, 2021 at 1:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Seems to just need an update of the expected-file to account for test
>>> cases added recently.  (I take no position on whether the new results
>>> are desirable; some of these might be breaking the intent of the case.
>>> But this should quiet the cfbot anyway.)

>> The test case was added by commit "Add support for asynchronous execution"
>> "27e1f14563cf982f1f4d71e21ef247866662a052" by Etsuro Fujita. He can comment
>> whether the new results are desirable or not.

> The new results aren't what I intended.  I'll update the patch to
> avoid that by modifying the original test cases properly, if there are
> no objections.

Please follow up on that sometime?  In the meantime, here is a rebase
over aa769f80e and 2dc53fe2a, to placate the cfbot.

The real reason that this hasn't gotten committed is that I remain
pretty uncomfortable about whether it's an acceptable solution to
the problem.  Suddenly asking people to plaster COLLATE clauses
on all their textual remote columns seems like a big compatibility
gotcha.  However, I lack any ideas about a less unpleasant solution.

            regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d98bd66681..654b09273e 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -328,7 +328,9 @@ foreign_expr_walker(Node *node,

                 /*
                  * If the Var is from the foreign table, we consider its
-                 * collation (if any) safe to use.  If it is from another
+                 * collation (if any) safe to use, *unless* it's
+                 * DEFAULT_COLLATION_OID.  We treat that as meaning "we don't
+                 * know which collation this is".  If it is from another
                  * table, we treat its collation the same way as we would a
                  * Param's collation, ie it's not safe for it to have a
                  * non-default collation.
@@ -350,7 +352,12 @@ foreign_expr_walker(Node *node,

                     /* Else check the collation */
                     collation = var->varcollid;
-                    state = OidIsValid(collation) ? FDW_COLLATE_SAFE : FDW_COLLATE_NONE;
+                    if (collation == InvalidOid)
+                        state = FDW_COLLATE_NONE;
+                    else if (collation == DEFAULT_COLLATION_OID)
+                        state = FDW_COLLATE_UNSAFE;
+                    else
+                        state = FDW_COLLATE_SAFE;
                 }
                 else
                 {
@@ -941,8 +948,24 @@ foreign_expr_walker(Node *node,

     /*
      * Now, merge my collation information into my parent's state.
+     *
+     * If one branch of an expression derives a non-default collation safely
+     * (that is, from a foreign Var) and another one derives the same
+     * collation unsafely, we can consider the expression safe overall.  This
+     * allows cases such as "foreign_var = ('foo' COLLATE x)" where x is the
+     * same collation the foreign_var has anyway.  Note that we will not ship
+     * any explicit COLLATE clause to the remote, but rely on it to re-derive
+     * the correct collation based on the foreign_var.
      */
-    if (state > outer_cxt->state)
+    if (collation == outer_cxt->collation &&
+        ((state == FDW_COLLATE_UNSAFE &&
+          outer_cxt->state == FDW_COLLATE_SAFE) ||
+         (state == FDW_COLLATE_SAFE &&
+          outer_cxt->state == FDW_COLLATE_UNSAFE)))
+    {
+        outer_cxt->state = FDW_COLLATE_SAFE;
+    }
+    else if (state > outer_cxt->state)
     {
         /* Override previous parent state */
         outer_cxt->collation = collation;
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index e3ee30f1aa..9e52e09a8b 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -32,29 +32,29 @@ CREATE SCHEMA "S 1";
 CREATE TABLE "S 1"."T 1" (
     "C 1" int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     c4 timestamptz,
     c5 timestamp,
-    c6 varchar(10),
-    c7 char(10),
+    c6 varchar(10) collate "C",
+    c7 char(10) collate "C",
     c8 user_enum,
     CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
 );
 CREATE TABLE "S 1"."T 2" (
     c1 int NOT NULL,
-    c2 text,
+    c2 text collate "C",
     CONSTRAINT t2_pkey PRIMARY KEY (c1)
 );
 CREATE TABLE "S 1"."T 3" (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     CONSTRAINT t3_pkey PRIMARY KEY (c1)
 );
 CREATE TABLE "S 1"."T 4" (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
 -- Disable autovacuum for these tables to avoid unexpected effects of that
@@ -94,16 +94,18 @@ ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
 -- ===================================================================
 -- create foreign tables
+-- Note: to ensure stable regression results, all collatable columns
+-- in these tables must have explicitly-specified collations.
 -- ===================================================================
 CREATE FOREIGN TABLE ft1 (
     c0 int,
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     c4 timestamptz,
     c5 timestamp,
-    c6 varchar(10),
-    c7 char(10) default 'ft1',
+    c6 varchar(10) collate "C",
+    c7 char(10) default 'ft1' collate "C",
     c8 user_enum
 ) SERVER loopback;
 ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
@@ -111,28 +113,28 @@ CREATE FOREIGN TABLE ft2 (
     c1 int NOT NULL,
     c2 int NOT NULL,
     cx int,
-    c3 text,
+    c3 text collate "C",
     c4 timestamptz,
     c5 timestamp,
-    c6 varchar(10),
-    c7 char(10) default 'ft2',
+    c6 varchar(10) collate "C",
+    c7 char(10) default 'ft2' collate "C",
     c8 user_enum
 ) SERVER loopback;
 ALTER FOREIGN TABLE ft2 DROP COLUMN cx;
 CREATE FOREIGN TABLE ft4 (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text
+    c3 text collate "C"
 ) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 3');
 CREATE FOREIGN TABLE ft5 (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text
+    c3 text collate "C"
 ) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 4');
 CREATE FOREIGN TABLE ft6 (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text
+    c3 text collate "C"
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 CREATE FOREIGN TABLE ft7 (
     c1 int NOT NULL,
@@ -288,7 +290,7 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1, t1.tabl
    Output: c1, c2, c3, c4, c5, c6, c7, c8, tableoid
    ->  Sort
          Output: c1, c2, c3, c4, c5, c6, c7, c8, tableoid
-         Sort Key: t1.c3, t1.c1, t1.tableoid
+         Sort Key: t1.c3 COLLATE "C", t1.c1, t1.tableoid
          ->  Foreign Scan on public.ft1 t1
                Output: c1, c2, c3, c4, c5, c6, c7, c8, tableoid
                Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
@@ -902,12 +904,12 @@ EXPLAIN (VERBOSE, COSTS OFF)
 (6 rows)

 EXPLAIN (VERBOSE, COSTS OFF)
-    SELECT * FROM ft2 ORDER BY ft2.c1, ft2.c3 collate "C";
+    SELECT * FROM ft2 ORDER BY ft2.c1, ft2.c3 collate "POSIX";
                                   QUERY PLAN
 -------------------------------------------------------------------------------
  Sort
    Output: c1, c2, c3, c4, c5, c6, c7, c8, ((c3)::text)
-   Sort Key: ft2.c1, ft2.c3 COLLATE "C"
+   Sort Key: ft2.c1, ft2.c3 COLLATE "POSIX"
    ->  Foreign Scan on public.ft2
          Output: c1, c2, c3, c4, c5, c6, c7, c8, c3
          Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
@@ -1149,13 +1151,12 @@ SELECT * FROM ft1 WHERE CASE c3 WHEN c6 THEN true ELSE c3 < 'bar' END;
 -- but this is not because of collation
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT * FROM ft1 WHERE CASE c3 COLLATE "C" WHEN c6 THEN true ELSE c3 < 'bar' END;
-                                     QUERY PLAN
--------------------------------------------------------------------------------------
+                                                               QUERY PLAN
                  

+-----------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Filter: CASE (ft1.c3)::text WHEN ft1.c6 THEN true ELSE (ft1.c3 < 'bar'::text) END
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
-(4 rows)
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((CASE c3 WHEN c6 THEN true ELSE (c3 <
'bar'::text)END)) 
+(3 rows)

 -- ===================================================================
 -- JOIN queries
@@ -1921,7 +1922,7 @@ WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t
            Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1"
=r2."C 1")))) 
    ->  Sort
          Output: t.c1_1, t.c2_1, t.c1_3
-         Sort Key: t.c1_3, t.c1_1
+         Sort Key: t.c1_3 COLLATE "C", t.c1_1
          ->  CTE Scan on t
                Output: t.c1_1, t.c2_1, t.c1_3
 (12 rows)
@@ -2121,7 +2122,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8
    Output: t1.c1, t2.c1, t1.c3
    ->  Sort
          Output: t1.c1, t2.c1, t1.c3
-         Sort Key: t1.c3, t1.c1
+         Sort Key: t1.c3 COLLATE "C", t1.c1
          ->  Hash Right Join
                Output: t1.c1, t2.c1, t1.c3
                Hash Cond: (t2.c1 = t1.c1)
@@ -2163,7 +2164,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = t2.
    Output: t1.c1, t2.c1, t1.c3
    ->  Sort
          Output: t1.c1, t2.c1, t1.c3
-         Sort Key: t1.c3, t1.c1
+         Sort Key: t1.c3 COLLATE "C", t1.c1
          ->  Foreign Scan
                Output: t1.c1, t2.c1, t1.c3
                Filter: (t1.c8 = t2.c8)
@@ -3611,7 +3612,7 @@ select c2, c6, sum(c1) from ft1 where c2 < 3 group by grouping sets(c2, c6) orde
 ----------------------------------------------------------------------------------
  Sort
    Output: c2, c6, (sum(c1))
-   Sort Key: ft1.c2, ft1.c6
+   Sort Key: ft1.c2, ft1.c6 COLLATE "C"
    ->  HashAggregate
          Output: c2, c6, sum(c1)
          Hash Key: ft1.c2
@@ -3995,13 +3996,13 @@ EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st6;

 PREPARE st7 AS INSERT INTO ft1 (c1,c2,c3) VALUES (1001,101,'foo');
 EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st7;
-                                                                                           QUERY PLAN
                                                                          

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                 QUERY PLAN
                                                                                      

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Insert on public.ft1
    Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
    Batch Size: 1
    ->  Result
-         Output: NULL::integer, 1001, 101, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time
zone,NULL::character varying, 'ft1       '::character(10), NULL::user_enum 
+         Output: NULL::integer, 1001, 101, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time
zone,NULL::character varying COLLATE "C", 'ft1       '::character(10), NULL::user_enum 
 (5 rows)

 ALTER TABLE "S 1"."T 1" RENAME TO "T 0";
@@ -4029,13 +4030,13 @@ EXECUTE st6;
 (9 rows)

 EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st7;
-                                                                                           QUERY PLAN
                                                                          

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                 QUERY PLAN
                                                                                      

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Insert on public.ft1
    Remote SQL: INSERT INTO "S 1"."T 0"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
    Batch Size: 1
    ->  Result
-         Output: NULL::integer, 1001, 101, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time
zone,NULL::character varying, 'ft1       '::character(10), NULL::user_enum 
+         Output: NULL::integer, 1001, 101, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time
zone,NULL::character varying COLLATE "C", 'ft1       '::character(10), NULL::user_enum 
 (5 rows)

 ALTER TABLE "S 1"."T 0" RENAME TO "T 1";
@@ -4266,36 +4267,14 @@ explain (verbose, costs off) select * from ft3 where f1 COLLATE "C" = 'foo';
    Remote SQL: SELECT f1, f2, f3 FROM public.loct3 WHERE ((f1 = 'foo'::text))
 (3 rows)

-explain (verbose, costs off) select * from ft3 where f2 = 'foo';
-                                  QUERY PLAN
-------------------------------------------------------------------------------
- Foreign Scan on public.ft3
-   Output: f1, f2, f3
-   Remote SQL: SELECT f1, f2, f3 FROM public.loct3 WHERE ((f2 = 'foo'::text))
-(3 rows)
-
-explain (verbose, costs off) select * from ft3 where f3 = 'foo';
+explain (verbose, costs off) select * from ft3 where f1 = 'foo' COLLATE "C";
                                   QUERY PLAN
 ------------------------------------------------------------------------------
  Foreign Scan on public.ft3
    Output: f1, f2, f3
-   Remote SQL: SELECT f1, f2, f3 FROM public.loct3 WHERE ((f3 = 'foo'::text))
+   Remote SQL: SELECT f1, f2, f3 FROM public.loct3 WHERE ((f1 = 'foo'::text))
 (3 rows)

-explain (verbose, costs off) select * from ft3 f, loct3 l
-  where f.f3 = l.f3 and l.f1 = 'foo';
-                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
- Nested Loop
-   Output: f.f1, f.f2, f.f3, l.f1, l.f2, l.f3
-   ->  Index Scan using loct3_f1_key on public.loct3 l
-         Output: l.f1, l.f2, l.f3
-         Index Cond: (l.f1 = 'foo'::text)
-   ->  Foreign Scan on public.ft3 f
-         Output: f.f1, f.f2, f.f3
-         Remote SQL: SELECT f1, f2, f3 FROM public.loct3 WHERE (($1::character varying(10) = f3))
-(8 rows)
-
 -- can't be sent to remote
 explain (verbose, costs off) select * from ft3 where f1 COLLATE "POSIX" = 'foo';
                     QUERY PLAN
@@ -4306,12 +4285,12 @@ explain (verbose, costs off) select * from ft3 where f1 COLLATE "POSIX" = 'foo';
    Remote SQL: SELECT f1, f2, f3 FROM public.loct3
 (4 rows)

-explain (verbose, costs off) select * from ft3 where f1 = 'foo' COLLATE "C";
+explain (verbose, costs off) select * from ft3 where f2 = 'foo';
                     QUERY PLAN
 ---------------------------------------------------
  Foreign Scan on public.ft3
    Output: f1, f2, f3
-   Filter: (ft3.f1 = 'foo'::text COLLATE "C")
+   Filter: (ft3.f2 = 'foo'::text)
    Remote SQL: SELECT f1, f2, f3 FROM public.loct3
 (4 rows)

@@ -4333,6 +4312,33 @@ explain (verbose, costs off) select * from ft3 where f2 = 'foo' COLLATE "C";
    Remote SQL: SELECT f1, f2, f3 FROM public.loct3
 (4 rows)

+explain (verbose, costs off) select * from ft3 where f3 = 'foo';
+                    QUERY PLAN
+---------------------------------------------------
+ Foreign Scan on public.ft3
+   Output: f1, f2, f3
+   Filter: ((ft3.f3)::text = 'foo'::text)
+   Remote SQL: SELECT f1, f2, f3 FROM public.loct3
+(4 rows)
+
+explain (verbose, costs off) select * from ft3 f, loct3 l
+  where f.f3 = l.f3 and l.f1 = 'foo';
+                         QUERY PLAN
+-------------------------------------------------------------
+ Hash Join
+   Output: f.f1, f.f2, f.f3, l.f1, l.f2, l.f3
+   Inner Unique: true
+   Hash Cond: ((f.f3)::text = (l.f3)::text)
+   ->  Foreign Scan on public.ft3 f
+         Output: f.f1, f.f2, f.f3
+         Remote SQL: SELECT f1, f2, f3 FROM public.loct3
+   ->  Hash
+         Output: l.f1, l.f2, l.f3
+         ->  Index Scan using loct3_f1_key on public.loct3 l
+               Output: l.f1, l.f2, l.f3
+               Index Cond: (l.f1 = 'foo'::text)
+(12 rows)
+
 explain (verbose, costs off) select * from ft3 f, loct3 l
   where f.f3 = l.f3 COLLATE "POSIX" and l.f1 = 'foo';
                          QUERY PLAN
@@ -4356,13 +4362,13 @@ explain (verbose, costs off) select * from ft3 f, loct3 l
 -- ===================================================================
 EXPLAIN (verbose, costs off)
 INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
-
QUERYPLAN
     

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
  QUERY PLAN
              

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Insert on public.ft2
    Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
    Batch Size: 1
    ->  Subquery Scan on "*SELECT*"
-         Output: "*SELECT*"."?column?", "*SELECT*"."?column?_1", NULL::integer, "*SELECT*"."?column?_2",
NULL::timestampwith time zone, NULL::timestamp without time zone, NULL::character varying, 'ft2       '::character(10),
NULL::user_enum
+         Output: "*SELECT*"."?column?", "*SELECT*"."?column?_1", NULL::integer, "*SELECT*"."?column?_2",
NULL::timestampwith time zone, NULL::timestamp without time zone, NULL::character varying COLLATE "C", 'ft2
'::character(10),NULL::user_enum 
          ->  Foreign Scan on public.ft2 ft2_1
                Output: (ft2_1.c1 + 1000), (ft2_1.c2 + 100), (ft2_1.c3 || ft2_1.c3)
                Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1" LIMIT 20::bigint
@@ -5472,14 +5478,14 @@ SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;

 EXPLAIN (verbose, costs off)
 INSERT INTO ft2 (c1,c2,c3) VALUES (1200,999,'foo') RETURNING tableoid::regclass;
-                                                                                           QUERY PLAN
                                                                          

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                 QUERY PLAN
                                                                                      

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Insert on public.ft2
    Output: (ft2.tableoid)::regclass
    Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
    Batch Size: 1
    ->  Result
-         Output: 1200, 999, NULL::integer, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time
zone,NULL::character varying, 'ft2       '::character(10), NULL::user_enum 
+         Output: 1200, 999, NULL::integer, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time
zone,NULL::character varying COLLATE "C", 'ft2       '::character(10), NULL::user_enum 
 (6 rows)

 INSERT INTO ft2 (c1,c2,c3) VALUES (1200,999,'foo') RETURNING tableoid::regclass;
@@ -7881,12 +7887,12 @@ NOTICE:  drop cascades to foreign table bar2
 drop table loct1;
 drop table loct2;
 -- Test pushing down UPDATE/DELETE joins to the remote server
-create table parent (a int, b text);
-create table loct1 (a int, b text);
-create table loct2 (a int, b text);
-create foreign table remt1 (a int, b text)
+create table parent (a int, b text collate "C");
+create table loct1 (a int, b text collate "C");
+create table loct2 (a int, b text collate "C");
+create foreign table remt1 (a int, b text collate "C")
   server loopback options (table_name 'loct1');
-create foreign table remt2 (a int, b text)
+create foreign table remt2 (a int, b text collate "C")
   server loopback options (table_name 'loct2');
 alter foreign table remt1 inherit parent;
 insert into remt1 values (1, 'foo');
@@ -8095,10 +8101,10 @@ drop table itrtest;
 drop table loct1;
 drop table loct2;
 -- Test update tuple routing
-create table utrtest (a int, b text) partition by list (a);
-create table loct (a int check (a in (1)), b text);
-create foreign table remp (a int check (a in (1)), b text) server loopback options (table_name 'loct');
-create table locp (a int check (a in (2)), b text);
+create table utrtest (a int, b text collate "C") partition by list (a);
+create table loct (a int check (a in (1)), b text collate "C");
+create foreign table remp (a int check (a in (1)), b text collate "C") server loopback options (table_name 'loct');
+create table locp (a int check (a in (2)), b text collate "C");
 alter table utrtest attach partition remp for values in (1);
 alter table utrtest attach partition locp for values in (2);
 insert into utrtest values (1, 'foo');
@@ -8260,7 +8266,7 @@ alter table utrtest detach partition remp;
 drop foreign table remp;
 alter table loct drop constraint loct_a_check;
 alter table loct add check (a in (3));
-create foreign table remp (a int check (a in (3)), b text) server loopback options (table_name 'loct');
+create foreign table remp (a int check (a in (3)), b text collate "C") server loopback options (table_name 'loct');
 alter table utrtest attach partition remp for values in (3);
 insert into utrtest values (2, 'qux');
 insert into utrtest values (3, 'xyzzy');
@@ -8721,11 +8727,11 @@ tru_rtable_parent,tru_rtable_child, tru_rtable0_child;
 -- test IMPORT FOREIGN SCHEMA
 -- ===================================================================
 CREATE SCHEMA import_source;
-CREATE TABLE import_source.t1 (c1 int, c2 varchar NOT NULL);
-CREATE TABLE import_source.t2 (c1 int default 42, c2 varchar NULL, c3 text collate "POSIX");
-CREATE TYPE typ1 AS (m1 int, m2 varchar);
+CREATE TABLE import_source.t1 (c1 int, c2 varchar NOT NULL COLLATE "C");
+CREATE TABLE import_source.t2 (c1 int default 42, c2 varchar NULL COLLATE "C", c3 text collate "POSIX");
+CREATE TYPE typ1 AS (m1 int, m2 varchar COLLATE "C");
 CREATE TABLE import_source.t3 (c1 timestamptz default now(), c2 typ1);
-CREATE TABLE import_source."x 4" (c1 float8, "C 2" text, c3 varchar(42));
+CREATE TABLE import_source."x 4" (c1 float8, "C 2" text COLLATE "C", c3 varchar(42) COLLATE "C");
 CREATE TABLE import_source."x 5" (c1 float8);
 ALTER TABLE import_source."x 5" DROP COLUMN c1;
 CREATE TABLE import_source."x 6" (c1 int, c2 int generated always as (c1 * 2) stored);
@@ -8754,7 +8760,7 @@ IMPORT FOREIGN SCHEMA import_source FROM SERVER loopback INTO import_dest1;
  Column |       Type        | Collation | Nullable | Default |    FDW options
 --------+-------------------+-----------+----------+---------+--------------------
  c1     | integer           |           |          |         | (column_name 'c1')
- c2     | character varying |           | not null |         | (column_name 'c2')
+ c2     | character varying | C         | not null |         | (column_name 'c2')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 't1')

@@ -8762,7 +8768,7 @@ FDW options: (schema_name 'import_source', table_name 't1')
  Column |       Type        | Collation | Nullable | Default |    FDW options
 --------+-------------------+-----------+----------+---------+--------------------
  c1     | integer           |           |          |         | (column_name 'c1')
- c2     | character varying |           |          |         | (column_name 'c2')
+ c2     | character varying | C         |          |         | (column_name 'c2')
  c3     | text              | POSIX     |          |         | (column_name 'c3')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 't2')
@@ -8786,8 +8792,8 @@ FDW options: (schema_name 'import_source', table_name 't4')
  Column |         Type          | Collation | Nullable | Default |     FDW options
 --------+-----------------------+-----------+----------+---------+---------------------
  c1     | double precision      |           |          |         | (column_name 'c1')
- C 2    | text                  |           |          |         | (column_name 'C 2')
- c3     | character varying(42) |           |          |         | (column_name 'c3')
+ C 2    | text                  | C         |          |         | (column_name 'C 2')
+ c3     | character varying(42) | C         |          |         | (column_name 'c3')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 'x 4')

@@ -8827,7 +8833,7 @@ IMPORT FOREIGN SCHEMA import_source FROM SERVER loopback INTO import_dest2
  Column |       Type        | Collation | Nullable | Default |    FDW options
 --------+-------------------+-----------+----------+---------+--------------------
  c1     | integer           |           |          |         | (column_name 'c1')
- c2     | character varying |           | not null |         | (column_name 'c2')
+ c2     | character varying | C         | not null |         | (column_name 'c2')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 't1')

@@ -8835,7 +8841,7 @@ FDW options: (schema_name 'import_source', table_name 't1')
  Column |       Type        | Collation | Nullable | Default |    FDW options
 --------+-------------------+-----------+----------+---------+--------------------
  c1     | integer           |           |          | 42      | (column_name 'c1')
- c2     | character varying |           |          |         | (column_name 'c2')
+ c2     | character varying | C         |          |         | (column_name 'c2')
  c3     | text              | POSIX     |          |         | (column_name 'c3')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 't2')
@@ -8859,8 +8865,8 @@ FDW options: (schema_name 'import_source', table_name 't4')
  Column |         Type          | Collation | Nullable | Default |     FDW options
 --------+-----------------------+-----------+----------+---------+---------------------
  c1     | double precision      |           |          |         | (column_name 'c1')
- C 2    | text                  |           |          |         | (column_name 'C 2')
- c3     | character varying(42) |           |          |         | (column_name 'c3')
+ C 2    | text                  | C         |          |         | (column_name 'C 2')
+ c3     | character varying(42) | C         |          |         | (column_name 'c3')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 'x 4')

@@ -10620,14 +10626,16 @@ SELECT * FROM insert_tbl ORDER BY a;
 EXPLAIN (VERBOSE, COSTS OFF)
 WITH t AS (UPDATE remote_tbl SET c = c || c RETURNING *)
 INSERT INTO join_tbl SELECT * FROM async_pt LEFT JOIN t ON (async_pt.a = t.a AND async_pt.b = t.b) WHERE async_pt.b
===505; 
-                                       QUERY PLAN
-----------------------------------------------------------------------------------------
+                                         QUERY PLAN
+--------------------------------------------------------------------------------------------
  Insert on public.join_tbl
    CTE t
      ->  Update on public.remote_tbl
            Output: remote_tbl.a, remote_tbl.b, remote_tbl.c
-           ->  Foreign Update on public.remote_tbl
-                 Remote SQL: UPDATE public.base_tbl3 SET c = (c || c) RETURNING a, b, c
+           Remote SQL: UPDATE public.base_tbl3 SET c = $2 WHERE ctid = $1 RETURNING a, b, c
+           ->  Foreign Scan on public.remote_tbl
+                 Output: (remote_tbl.c || remote_tbl.c), remote_tbl.ctid, remote_tbl.*
+                 Remote SQL: SELECT a, b, c, ctid FROM public.base_tbl3 FOR UPDATE
    ->  Nested Loop Left Join
          Output: async_pt.a, async_pt.b, async_pt.c, t.a, t.b, t.c
          Join Filter: ((async_pt.a = t.a) AND (async_pt.b = t.b))
@@ -10645,7 +10653,7 @@ INSERT INTO join_tbl SELECT * FROM async_pt LEFT JOIN t ON (async_pt.a = t.a AND
                      Filter: (async_pt_3.b === 505)
          ->  CTE Scan on t
                Output: t.a, t.b, t.c
-(23 rows)
+(25 rows)

 WITH t AS (UPDATE remote_tbl SET c = c || c RETURNING *)
 INSERT INTO join_tbl SELECT * FROM async_pt LEFT JOIN t ON (async_pt.a = t.a AND async_pt.b = t.b) WHERE async_pt.b
===505; 
@@ -10673,17 +10681,21 @@ UPDATE async_pt SET c = c || c WHERE b = 0 RETURNING *;
  Update on public.async_pt
    Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
    Foreign Update on public.async_p1 async_pt_1
+     Remote SQL: UPDATE public.base_tbl1 SET c = $2 WHERE ctid = $1 RETURNING a, b, c
    Foreign Update on public.async_p2 async_pt_2
+     Remote SQL: UPDATE public.base_tbl2 SET c = $2 WHERE ctid = $1 RETURNING a, b, c
    Update on public.async_p3 async_pt_3
    ->  Append
-         ->  Foreign Update on public.async_p1 async_pt_1
-               Remote SQL: UPDATE public.base_tbl1 SET c = (c || c) WHERE ((b = 0)) RETURNING a, b, c
-         ->  Foreign Update on public.async_p2 async_pt_2
-               Remote SQL: UPDATE public.base_tbl2 SET c = (c || c) WHERE ((b = 0)) RETURNING a, b, c
+         ->  Foreign Scan on public.async_p1 async_pt_1
+               Output: (async_pt_1.c || async_pt_1.c), async_pt_1.tableoid, async_pt_1.ctid, async_pt_1.*
+               Remote SQL: SELECT a, b, c, ctid FROM public.base_tbl1 WHERE ((b = 0)) FOR UPDATE
+         ->  Foreign Scan on public.async_p2 async_pt_2
+               Output: (async_pt_2.c || async_pt_2.c), async_pt_2.tableoid, async_pt_2.ctid, async_pt_2.*
+               Remote SQL: SELECT a, b, c, ctid FROM public.base_tbl2 WHERE ((b = 0)) FOR UPDATE
          ->  Seq Scan on public.async_p3 async_pt_3
                Output: (async_pt_3.c || async_pt_3.c), async_pt_3.tableoid, async_pt_3.ctid, NULL::record
                Filter: (async_pt_3.b = 0)
-(13 rows)
+(17 rows)

 UPDATE async_pt SET c = c || c WHERE b = 0 RETURNING *;
   a   | b |    c
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 4bdab30a73..cacc80585f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "catalog/pg_collation.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -5299,9 +5300,13 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
                                    "  NULL, ");

         if (import_collate)
-            appendStringInfoString(&buf,
-                                   "  collname, "
-                                   "  collnsp.nspname ");
+            appendStringInfo(&buf,
+                             "  CASE WHEN coll.oid = '%u' THEN"
+                             " defcoll.collname ELSE coll.collname END, "
+                             "  CASE WHEN coll.oid = '%u' THEN"
+                             " defcoll.nspname ELSE collnsp.nspname END ",
+                             DEFAULT_COLLATION_OID,
+                             DEFAULT_COLLATION_OID);
         else
             appendStringInfoString(&buf,
                                    "  NULL, NULL ");
@@ -5317,11 +5322,28 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
                                "    adrelid = c.oid AND adnum = attnum ");

         if (import_collate)
+        {
             appendStringInfoString(&buf,
                                    "  LEFT JOIN pg_collation coll ON "
                                    "    coll.oid = attcollation "
                                    "  LEFT JOIN pg_namespace collnsp ON "
-                                   "    collnsp.oid = collnamespace ");
+                                   "    collnsp.oid = collnamespace "
+                                   "  LEFT JOIN ("
+                                   " SELECT cd.collname, nd.nspname FROM"
+                                   " pg_collation cd, pg_namespace nd, pg_database d"
+                                   " WHERE nd.oid = cd.collnamespace AND"
+                                   " d.datname = current_database() AND");
+            /* collprovider is new as of v10 */
+            if (PQserverVersion(conn) >= 100000)
+                appendStringInfoString(&buf,
+                                       " cd.collprovider = 'c' AND");
+            appendStringInfoString(&buf,
+                                   " cd.collcollate = d.datcollate AND"
+                                   " cd.collctype = d.datctype AND"
+                                   " cd.collencoding IN (d.encoding, -1)"
+                                   " ORDER BY length(cd.collname) LIMIT 1 )"
+                                   " defcoll ON TRUE ");
+        }

         appendStringInfoString(&buf,
                                "WHERE c.relkind IN ("
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 30b5175da5..a96898ed6b 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -36,29 +36,29 @@ CREATE SCHEMA "S 1";
 CREATE TABLE "S 1"."T 1" (
     "C 1" int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     c4 timestamptz,
     c5 timestamp,
-    c6 varchar(10),
-    c7 char(10),
+    c6 varchar(10) collate "C",
+    c7 char(10) collate "C",
     c8 user_enum,
     CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
 );
 CREATE TABLE "S 1"."T 2" (
     c1 int NOT NULL,
-    c2 text,
+    c2 text collate "C",
     CONSTRAINT t2_pkey PRIMARY KEY (c1)
 );
 CREATE TABLE "S 1"."T 3" (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     CONSTRAINT t3_pkey PRIMARY KEY (c1)
 );
 CREATE TABLE "S 1"."T 4" (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );

@@ -102,16 +102,18 @@ ANALYZE "S 1"."T 4";

 -- ===================================================================
 -- create foreign tables
+-- Note: to ensure stable regression results, all collatable columns
+-- in these tables must have explicitly-specified collations.
 -- ===================================================================
 CREATE FOREIGN TABLE ft1 (
     c0 int,
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     c4 timestamptz,
     c5 timestamp,
-    c6 varchar(10),
-    c7 char(10) default 'ft1',
+    c6 varchar(10) collate "C",
+    c7 char(10) default 'ft1' collate "C",
     c8 user_enum
 ) SERVER loopback;
 ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
@@ -120,11 +122,11 @@ CREATE FOREIGN TABLE ft2 (
     c1 int NOT NULL,
     c2 int NOT NULL,
     cx int,
-    c3 text,
+    c3 text collate "C",
     c4 timestamptz,
     c5 timestamp,
-    c6 varchar(10),
-    c7 char(10) default 'ft2',
+    c6 varchar(10) collate "C",
+    c7 char(10) default 'ft2' collate "C",
     c8 user_enum
 ) SERVER loopback;
 ALTER FOREIGN TABLE ft2 DROP COLUMN cx;
@@ -132,19 +134,19 @@ ALTER FOREIGN TABLE ft2 DROP COLUMN cx;
 CREATE FOREIGN TABLE ft4 (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text
+    c3 text collate "C"
 ) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 3');

 CREATE FOREIGN TABLE ft5 (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text
+    c3 text collate "C"
 ) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 4');

 CREATE FOREIGN TABLE ft6 (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text
+    c3 text collate "C"
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');

 CREATE FOREIGN TABLE ft7 (
@@ -354,7 +356,7 @@ SELECT * FROM ft2 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft1 WHERE c1 < 5));
 EXPLAIN (VERBOSE, COSTS OFF)
     SELECT * FROM ft2 ORDER BY ft2.c1, random();
 EXPLAIN (VERBOSE, COSTS OFF)
-    SELECT * FROM ft2 ORDER BY ft2.c1, ft2.c3 collate "C";
+    SELECT * FROM ft2 ORDER BY ft2.c1, ft2.c3 collate "POSIX";

 -- user-defined operator/function
 CREATE FUNCTION postgres_fdw_abs(int) RETURNS int AS $$
@@ -1194,15 +1196,15 @@ create foreign table ft3 (f1 text collate "C", f2 text, f3 varchar(10))
 -- can be sent to remote
 explain (verbose, costs off) select * from ft3 where f1 = 'foo';
 explain (verbose, costs off) select * from ft3 where f1 COLLATE "C" = 'foo';
-explain (verbose, costs off) select * from ft3 where f2 = 'foo';
-explain (verbose, costs off) select * from ft3 where f3 = 'foo';
-explain (verbose, costs off) select * from ft3 f, loct3 l
-  where f.f3 = l.f3 and l.f1 = 'foo';
+explain (verbose, costs off) select * from ft3 where f1 = 'foo' COLLATE "C";
 -- can't be sent to remote
 explain (verbose, costs off) select * from ft3 where f1 COLLATE "POSIX" = 'foo';
-explain (verbose, costs off) select * from ft3 where f1 = 'foo' COLLATE "C";
+explain (verbose, costs off) select * from ft3 where f2 = 'foo';
 explain (verbose, costs off) select * from ft3 where f2 COLLATE "C" = 'foo';
 explain (verbose, costs off) select * from ft3 where f2 = 'foo' COLLATE "C";
+explain (verbose, costs off) select * from ft3 where f3 = 'foo';
+explain (verbose, costs off) select * from ft3 f, loct3 l
+  where f.f3 = l.f3 and l.f1 = 'foo';
 explain (verbose, costs off) select * from ft3 f, loct3 l
   where f.f3 = l.f3 COLLATE "POSIX" and l.f1 = 'foo';

@@ -2065,12 +2067,12 @@ drop table loct1;
 drop table loct2;

 -- Test pushing down UPDATE/DELETE joins to the remote server
-create table parent (a int, b text);
-create table loct1 (a int, b text);
-create table loct2 (a int, b text);
-create foreign table remt1 (a int, b text)
+create table parent (a int, b text collate "C");
+create table loct1 (a int, b text collate "C");
+create table loct2 (a int, b text collate "C");
+create foreign table remt1 (a int, b text collate "C")
   server loopback options (table_name 'loct1');
-create foreign table remt2 (a int, b text)
+create foreign table remt2 (a int, b text collate "C")
   server loopback options (table_name 'loct2');
 alter foreign table remt1 inherit parent;

@@ -2163,10 +2165,10 @@ drop table loct1;
 drop table loct2;

 -- Test update tuple routing
-create table utrtest (a int, b text) partition by list (a);
-create table loct (a int check (a in (1)), b text);
-create foreign table remp (a int check (a in (1)), b text) server loopback options (table_name 'loct');
-create table locp (a int check (a in (2)), b text);
+create table utrtest (a int, b text collate "C") partition by list (a);
+create table loct (a int check (a in (1)), b text collate "C");
+create foreign table remp (a int check (a in (1)), b text collate "C") server loopback options (table_name 'loct');
+create table locp (a int check (a in (2)), b text collate "C");
 alter table utrtest attach partition remp for values in (1);
 alter table utrtest attach partition locp for values in (2);

@@ -2243,7 +2245,7 @@ alter table utrtest detach partition remp;
 drop foreign table remp;
 alter table loct drop constraint loct_a_check;
 alter table loct add check (a in (3));
-create foreign table remp (a int check (a in (3)), b text) server loopback options (table_name 'loct');
+create foreign table remp (a int check (a in (3)), b text collate "C") server loopback options (table_name 'loct');
 alter table utrtest attach partition remp for values in (3);
 insert into utrtest values (2, 'qux');
 insert into utrtest values (3, 'xyzzy');
@@ -2553,11 +2555,11 @@ tru_rtable_parent,tru_rtable_child, tru_rtable0_child;
 -- ===================================================================

 CREATE SCHEMA import_source;
-CREATE TABLE import_source.t1 (c1 int, c2 varchar NOT NULL);
-CREATE TABLE import_source.t2 (c1 int default 42, c2 varchar NULL, c3 text collate "POSIX");
-CREATE TYPE typ1 AS (m1 int, m2 varchar);
+CREATE TABLE import_source.t1 (c1 int, c2 varchar NOT NULL COLLATE "C");
+CREATE TABLE import_source.t2 (c1 int default 42, c2 varchar NULL COLLATE "C", c3 text collate "POSIX");
+CREATE TYPE typ1 AS (m1 int, m2 varchar COLLATE "C");
 CREATE TABLE import_source.t3 (c1 timestamptz default now(), c2 typ1);
-CREATE TABLE import_source."x 4" (c1 float8, "C 2" text, c3 varchar(42));
+CREATE TABLE import_source."x 4" (c1 float8, "C 2" text COLLATE "C", c3 varchar(42) COLLATE "C");
 CREATE TABLE import_source."x 5" (c1 float8);
 ALTER TABLE import_source."x 5" DROP COLUMN c1;
 CREATE TABLE import_source."x 6" (c1 int, c2 int generated always as (c1 * 2) stored);
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 0075bc3dbb..fc368db47a 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -97,6 +97,12 @@
   need, surprising semantic anomalies may arise when types or collations do
   not match, due to the remote server interpreting <literal>WHERE</literal> clauses
   slightly differently from the local server.
+  Also, while you need not apply any explicit <literal>COLLATE</literal>
+  clause to a collatable column of a foreign table, not doing so will cause
+  the planner to assume that the remote column's collation does not match any
+  locally available collation.  This will generally result in not sending
+  <literal>WHERE</literal> clauses involving such a column to the remote
+  server, which may greatly degrade query performance.
  </para>

  <para>
@@ -537,6 +543,10 @@ OPTIONS (ADD password_required 'false');
        need to turn this off if the remote server has a different set of
        collation names than the local server does, which is likely to be the
        case if it's running on a different operating system.
+       (Note, however, that simply turning this off may result in
+       non-optimal plans for queries on the imported tables.  It's best to
+       find a way to label collatable columns with collations that match the
+       remote server's behavior.)
       </para>
      </listitem>
     </varlistentry>

pgsql-hackers by date:

Previous
From: Sehrope Sarkuni
Date:
Subject: Re: Add jsonlog log_destination for JSON server logs
Next
From: Jacob Champion
Date:
Subject: Re: [PATCH] Support pg_ident mapping for LDAP