Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails - Mailing list pgsql-hackers

From Tom Lane
Subject Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Date
Msg-id 1311836.1597781384@sss.pgh.pa.us
Whole thread Raw
Responses Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
List pgsql-hackers
I wrote:
>> So I think what is happening here is that postgres_fdw's version of
>> IMPORT FOREIGN SCHEMA translates "COLLATE default" on the remote
>> server to "COLLATE default" on the local one, which of course is
>> a big fail if the defaults don't match.  That allows the local
>> planner to believe that remote ORDER BYs on the two foreign tables
>> will give compatible results, causing the merge join to not work
>> very well at all.

Here's a full patch addressing this issue.  I decided that the best
way to address the test-instability problem is to explicitly give
collations to all the foreign-table columns for which it matters
in the postgres_fdw test.  (For portability's sake, that has to be
"C" or "POSIX"; I mostly used "C".)  Aside from ensuring that the
test still passes with some other prevailing locale, this seems like
a good idea since we'll then be testing the case we are encouraging
users to use.

And indeed, it immediately turned up a new problem: if we explicitly
assign a collation to a foreign-table column c, the system won't
ship WHERE clauses as simple as "c = 'foo'" to the remote.  This
surprised me, but the reason turned out to be that what postgres_fdw
is actually seeing is something like

   {OPEXPR 
   :opno 98 
   :opfuncid 67 
   :opresulttype 16 
   :opretset false 
   :opcollid 0 
   :inputcollid 950 
   :args (
      {VAR 
      :varno 6 
      :varattno 4 
      :vartype 25 
      :vartypmod -1 
      :varcollid 950 
      :varlevelsup 0 
      :varnosyn 6 
      :varattnosyn 4 
      :location 171
      }
      {RELABELTYPE 
      :arg 
         {CONST 
         :consttype 25 
         :consttypmod -1 
         :constcollid 100 
         :constlen -1 
         :constbyval false 
         :constisnull false 
         :location 341 
         :constvalue 9 [ 36 0 0 0 48 48 48 48 49 ]
         }
      :resulttype 25 
      :resulttypmod -1 
      :resultcollid 950 
      :relabelformat 2 
      :location -1
      }
   )
   :location -1
   }

that is, the constant is being explicitly relabeled with the correct
collation, and thus is_foreign_expr() thinks the collation shown by
the RelabelType node is an unsafely-introduced collation.

What I did about this was to change the recursion rule in
foreign_expr_walker() so that merging a safely-derived collation with
the same collation unsafely derived is considered safe.  I think this
is all right, and it allows us to accept some cases that previously
were rejected as unsafe.  But I might be missing something.

(BTW, there's an independent bug here, which is that we're getting
a tree of the above shape rather than a simple Const with the
appropriate collation; that is, this tree isn't fully const-folded.
This is a bug in canonicalize_ec_expression, which I'll go fix
separately.  But it won't affect the problem at hand.)

This seems like a sufficiently large change in postgres_fdw's
behavior to require review, so I'll go add this to the next CF.

            regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..f0693a061c 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -320,7 +320,9 @@ foreign_expr_walker(Node *node,

                 /*
                  * If the Var is from the foreign table, we consider its
-                 * collation (if any) safe to use.  If it is from another
+                 * collation (if any) safe to use, *unless* it's
+                 * DEFAULT_COLLATION_OID.  We treat that as meaning "we don't
+                 * know which collation this is".  If it is from another
                  * table, we treat its collation the same way as we would a
                  * Param's collation, ie it's not safe for it to have a
                  * non-default collation.
@@ -342,7 +344,12 @@ foreign_expr_walker(Node *node,

                     /* Else check the collation */
                     collation = var->varcollid;
-                    state = OidIsValid(collation) ? FDW_COLLATE_SAFE : FDW_COLLATE_NONE;
+                    if (collation == InvalidOid)
+                        state = FDW_COLLATE_NONE;
+                    else if (collation == DEFAULT_COLLATION_OID)
+                        state = FDW_COLLATE_UNSAFE;
+                    else
+                        state = FDW_COLLATE_SAFE;
                 }
                 else
                 {
@@ -808,8 +815,24 @@ foreign_expr_walker(Node *node,

     /*
      * Now, merge my collation information into my parent's state.
+     *
+     * If one branch of an expression derives a non-default collation safely
+     * (that is, from a foreign Var) and another one derives the same
+     * collation unsafely, we can consider the expression safe overall.  This
+     * allows cases such as "foreign_var = ('foo' COLLATE x)" where x is the
+     * same collation the foreign_var has anyway.  Note that we will not ship
+     * any explicit COLLATE clause to the remote, but rely on it to re-derive
+     * the correct collation based on the foreign_var.
      */
-    if (state > outer_cxt->state)
+    if (collation == outer_cxt->collation &&
+        ((state == FDW_COLLATE_UNSAFE &&
+          outer_cxt->state == FDW_COLLATE_SAFE) ||
+         (state == FDW_COLLATE_SAFE &&
+          outer_cxt->state == FDW_COLLATE_UNSAFE)))
+    {
+        outer_cxt->state = FDW_COLLATE_SAFE;
+    }
+    else if (state > outer_cxt->state)
     {
         /* Override previous parent state */
         outer_cxt->collation = collation;
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..4707e75d27 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -27,29 +27,29 @@ CREATE SCHEMA "S 1";
 CREATE TABLE "S 1"."T 1" (
     "C 1" int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     c4 timestamptz,
     c5 timestamp,
-    c6 varchar(10),
-    c7 char(10),
+    c6 varchar(10) collate "C",
+    c7 char(10) collate "C",
     c8 user_enum,
     CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
 );
 CREATE TABLE "S 1"."T 2" (
     c1 int NOT NULL,
-    c2 text,
+    c2 text collate "C",
     CONSTRAINT t2_pkey PRIMARY KEY (c1)
 );
 CREATE TABLE "S 1"."T 3" (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     CONSTRAINT t3_pkey PRIMARY KEY (c1)
 );
 CREATE TABLE "S 1"."T 4" (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
 -- Disable autovacuum for these tables to avoid unexpected effects of that
@@ -89,16 +89,18 @@ ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
 -- ===================================================================
 -- create foreign tables
+-- Note: to ensure stable regression results, all collatable columns
+-- in these tables must have explicitly-specified collations.
 -- ===================================================================
 CREATE FOREIGN TABLE ft1 (
     c0 int,
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     c4 timestamptz,
     c5 timestamp,
-    c6 varchar(10),
-    c7 char(10) default 'ft1',
+    c6 varchar(10) collate "C",
+    c7 char(10) default 'ft1' collate "C",
     c8 user_enum
 ) SERVER loopback;
 ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
@@ -106,28 +108,28 @@ CREATE FOREIGN TABLE ft2 (
     c1 int NOT NULL,
     c2 int NOT NULL,
     cx int,
-    c3 text,
+    c3 text collate "C",
     c4 timestamptz,
     c5 timestamp,
-    c6 varchar(10),
-    c7 char(10) default 'ft2',
+    c6 varchar(10) collate "C",
+    c7 char(10) default 'ft2' collate "C",
     c8 user_enum
 ) SERVER loopback;
 ALTER FOREIGN TABLE ft2 DROP COLUMN cx;
 CREATE FOREIGN TABLE ft4 (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text
+    c3 text collate "C"
 ) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 3');
 CREATE FOREIGN TABLE ft5 (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text
+    c3 text collate "C"
 ) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 4');
 CREATE FOREIGN TABLE ft6 (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text
+    c3 text collate "C"
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 -- ===================================================================
 -- tests for validator
@@ -277,7 +279,7 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1, t1.tabl
    Output: c1, c2, c3, c4, c5, c6, c7, c8, tableoid
    ->  Sort
          Output: c1, c2, c3, c4, c5, c6, c7, c8, tableoid
-         Sort Key: t1.c3, t1.c1, t1.tableoid
+         Sort Key: t1.c3 COLLATE "C", t1.c1, t1.tableoid
          ->  Foreign Scan on public.ft1 t1
                Output: c1, c2, c3, c4, c5, c6, c7, c8, tableoid
                Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
@@ -881,12 +883,12 @@ EXPLAIN (VERBOSE, COSTS OFF)
 (6 rows)

 EXPLAIN (VERBOSE, COSTS OFF)
-    SELECT * FROM ft2 ORDER BY ft2.c1, ft2.c3 collate "C";
+    SELECT * FROM ft2 ORDER BY ft2.c1, ft2.c3 collate "POSIX";
                                   QUERY PLAN
 -------------------------------------------------------------------------------
  Sort
    Output: c1, c2, c3, c4, c5, c6, c7, c8, ((c3)::text)
-   Sort Key: ft2.c1, ft2.c3 COLLATE "C"
+   Sort Key: ft2.c1, ft2.c3 COLLATE "POSIX"
    ->  Foreign Scan on public.ft2
          Output: c1, c2, c3, c4, c5, c6, c7, c8, c3
          Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
@@ -1808,7 +1810,7 @@ WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t
            Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1"
=r2."C 1")))) 
    ->  Sort
          Output: t.c1_1, t.c2_1, t.c1_3
-         Sort Key: t.c1_3, t.c1_1
+         Sort Key: t.c1_3 COLLATE "C", t.c1_1
          ->  CTE Scan on t
                Output: t.c1_1, t.c2_1, t.c1_3
 (12 rows)
@@ -2008,7 +2010,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8
    Output: t1.c1, t2.c1, t1.c3
    ->  Sort
          Output: t1.c1, t2.c1, t1.c3
-         Sort Key: t1.c3, t1.c1
+         Sort Key: t1.c3 COLLATE "C", t1.c1
          ->  Hash Right Join
                Output: t1.c1, t2.c1, t1.c3
                Hash Cond: (t2.c1 = t1.c1)
@@ -2050,7 +2052,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = t2.
    Output: t1.c1, t2.c1, t1.c3
    ->  Sort
          Output: t1.c1, t2.c1, t1.c3
-         Sort Key: t1.c3, t1.c1
+         Sort Key: t1.c3 COLLATE "C", t1.c1
          ->  Foreign Scan
                Output: t1.c1, t2.c1, t1.c3
                Filter: (t1.c8 = t2.c8)
@@ -3495,7 +3497,7 @@ select c2, c6, sum(c1) from ft1 where c2 < 3 group by grouping sets(c2, c6) orde
 ----------------------------------------------------------------------------------
  Sort
    Output: c2, c6, (sum(c1))
-   Sort Key: ft1.c2, ft1.c6
+   Sort Key: ft1.c2, ft1.c6 COLLATE "C"
    ->  HashAggregate
          Output: c2, c6, sum(c1)
          Hash Key: ft1.c2
@@ -3879,12 +3881,12 @@ EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st6;

 PREPARE st7 AS INSERT INTO ft1 (c1,c2,c3) VALUES (1001,101,'foo');
 EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st7;
-                                                                                           QUERY PLAN
                                                                          

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                 QUERY PLAN
                                                                                      

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Insert on public.ft1
    Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
    ->  Result
-         Output: NULL::integer, 1001, 101, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time
zone,NULL::character varying, 'ft1       '::character(10), NULL::user_enum 
+         Output: NULL::integer, 1001, 101, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time
zone,NULL::character varying COLLATE "C", 'ft1       '::character(10), NULL::user_enum 
 (4 rows)

 ALTER TABLE "S 1"."T 1" RENAME TO "T 0";
@@ -3912,12 +3914,12 @@ EXECUTE st6;
 (9 rows)

 EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st7;
-                                                                                           QUERY PLAN
                                                                          

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                 QUERY PLAN
                                                                                      

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Insert on public.ft1
    Remote SQL: INSERT INTO "S 1"."T 0"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
    ->  Result
-         Output: NULL::integer, 1001, 101, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time
zone,NULL::character varying, 'ft1       '::character(10), NULL::user_enum 
+         Output: NULL::integer, 1001, 101, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time
zone,NULL::character varying COLLATE "C", 'ft1       '::character(10), NULL::user_enum 
 (4 rows)

 ALTER TABLE "S 1"."T 0" RENAME TO "T 1";
@@ -4125,36 +4127,14 @@ explain (verbose, costs off) select * from ft3 where f1 COLLATE "C" = 'foo';
    Remote SQL: SELECT f1, f2, f3 FROM public.loct3 WHERE ((f1 = 'foo'::text))
 (3 rows)

-explain (verbose, costs off) select * from ft3 where f2 = 'foo';
-                                  QUERY PLAN
-------------------------------------------------------------------------------
- Foreign Scan on public.ft3
-   Output: f1, f2, f3
-   Remote SQL: SELECT f1, f2, f3 FROM public.loct3 WHERE ((f2 = 'foo'::text))
-(3 rows)
-
-explain (verbose, costs off) select * from ft3 where f3 = 'foo';
+explain (verbose, costs off) select * from ft3 where f1 = 'foo' COLLATE "C";
                                   QUERY PLAN
 ------------------------------------------------------------------------------
  Foreign Scan on public.ft3
    Output: f1, f2, f3
-   Remote SQL: SELECT f1, f2, f3 FROM public.loct3 WHERE ((f3 = 'foo'::text))
+   Remote SQL: SELECT f1, f2, f3 FROM public.loct3 WHERE ((f1 = 'foo'::text))
 (3 rows)

-explain (verbose, costs off) select * from ft3 f, loct3 l
-  where f.f3 = l.f3 and l.f1 = 'foo';
-                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
- Nested Loop
-   Output: f.f1, f.f2, f.f3, l.f1, l.f2, l.f3
-   ->  Index Scan using loct3_f1_key on public.loct3 l
-         Output: l.f1, l.f2, l.f3
-         Index Cond: (l.f1 = 'foo'::text)
-   ->  Foreign Scan on public.ft3 f
-         Output: f.f1, f.f2, f.f3
-         Remote SQL: SELECT f1, f2, f3 FROM public.loct3 WHERE (($1::character varying(10) = f3))
-(8 rows)
-
 -- can't be sent to remote
 explain (verbose, costs off) select * from ft3 where f1 COLLATE "POSIX" = 'foo';
                     QUERY PLAN
@@ -4165,12 +4145,12 @@ explain (verbose, costs off) select * from ft3 where f1 COLLATE "POSIX" = 'foo';
    Remote SQL: SELECT f1, f2, f3 FROM public.loct3
 (4 rows)

-explain (verbose, costs off) select * from ft3 where f1 = 'foo' COLLATE "C";
+explain (verbose, costs off) select * from ft3 where f2 = 'foo';
                     QUERY PLAN
 ---------------------------------------------------
  Foreign Scan on public.ft3
    Output: f1, f2, f3
-   Filter: (ft3.f1 = 'foo'::text COLLATE "C")
+   Filter: (ft3.f2 = 'foo'::text)
    Remote SQL: SELECT f1, f2, f3 FROM public.loct3
 (4 rows)

@@ -4192,6 +4172,33 @@ explain (verbose, costs off) select * from ft3 where f2 = 'foo' COLLATE "C";
    Remote SQL: SELECT f1, f2, f3 FROM public.loct3
 (4 rows)

+explain (verbose, costs off) select * from ft3 where f3 = 'foo';
+                    QUERY PLAN
+---------------------------------------------------
+ Foreign Scan on public.ft3
+   Output: f1, f2, f3
+   Filter: ((ft3.f3)::text = 'foo'::text)
+   Remote SQL: SELECT f1, f2, f3 FROM public.loct3
+(4 rows)
+
+explain (verbose, costs off) select * from ft3 f, loct3 l
+  where f.f3 = l.f3 and l.f1 = 'foo';
+                         QUERY PLAN
+-------------------------------------------------------------
+ Hash Join
+   Output: f.f1, f.f2, f.f3, l.f1, l.f2, l.f3
+   Inner Unique: true
+   Hash Cond: ((f.f3)::text = (l.f3)::text)
+   ->  Foreign Scan on public.ft3 f
+         Output: f.f1, f.f2, f.f3
+         Remote SQL: SELECT f1, f2, f3 FROM public.loct3
+   ->  Hash
+         Output: l.f1, l.f2, l.f3
+         ->  Index Scan using loct3_f1_key on public.loct3 l
+               Output: l.f1, l.f2, l.f3
+               Index Cond: (l.f1 = 'foo'::text)
+(12 rows)
+
 explain (verbose, costs off) select * from ft3 f, loct3 l
   where f.f3 = l.f3 COLLATE "POSIX" and l.f1 = 'foo';
                          QUERY PLAN
@@ -4215,12 +4222,12 @@ explain (verbose, costs off) select * from ft3 f, loct3 l
 -- ===================================================================
 EXPLAIN (verbose, costs off)
 INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
-
QUERYPLAN
     

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
  QUERY PLAN
              

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Insert on public.ft2
    Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
    ->  Subquery Scan on "*SELECT*"
-         Output: "*SELECT*"."?column?", "*SELECT*"."?column?_1", NULL::integer, "*SELECT*"."?column?_2",
NULL::timestampwith time zone, NULL::timestamp without time zone, NULL::character varying, 'ft2       '::character(10),
NULL::user_enum
+         Output: "*SELECT*"."?column?", "*SELECT*"."?column?_1", NULL::integer, "*SELECT*"."?column?_2",
NULL::timestampwith time zone, NULL::timestamp without time zone, NULL::character varying COLLATE "C", 'ft2
'::character(10),NULL::user_enum 
          ->  Foreign Scan on public.ft2 ft2_1
                Output: (ft2_1.c1 + 1000), (ft2_1.c2 + 100), (ft2_1.c3 || ft2_1.c3)
                Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1" LIMIT 20::bigint
@@ -5330,13 +5337,13 @@ SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;

 EXPLAIN (verbose, costs off)
 INSERT INTO ft2 (c1,c2,c3) VALUES (1200,999,'foo') RETURNING tableoid::regclass;
-                                                                                           QUERY PLAN
                                                                          

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                 QUERY PLAN
                                                                                      

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Insert on public.ft2
    Output: (ft2.tableoid)::regclass
    Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
    ->  Result
-         Output: 1200, 999, NULL::integer, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time
zone,NULL::character varying, 'ft2       '::character(10), NULL::user_enum 
+         Output: 1200, 999, NULL::integer, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time
zone,NULL::character varying COLLATE "C", 'ft2       '::character(10), NULL::user_enum 
 (5 rows)

 INSERT INTO ft2 (c1,c2,c3) VALUES (1200,999,'foo') RETURNING tableoid::regclass;
@@ -7551,12 +7558,12 @@ NOTICE:  drop cascades to foreign table bar2
 drop table loct1;
 drop table loct2;
 -- Test pushing down UPDATE/DELETE joins to the remote server
-create table parent (a int, b text);
-create table loct1 (a int, b text);
-create table loct2 (a int, b text);
-create foreign table remt1 (a int, b text)
+create table parent (a int, b text collate "C");
+create table loct1 (a int, b text collate "C");
+create table loct2 (a int, b text collate "C");
+create foreign table remt1 (a int, b text collate "C")
   server loopback options (table_name 'loct1');
-create foreign table remt2 (a int, b text)
+create foreign table remt2 (a int, b text collate "C")
   server loopback options (table_name 'loct2');
 alter foreign table remt1 inherit parent;
 insert into remt1 values (1, 'foo');
@@ -7755,10 +7762,10 @@ drop table itrtest;
 drop table loct1;
 drop table loct2;
 -- Test update tuple routing
-create table utrtest (a int, b text) partition by list (a);
-create table loct (a int check (a in (1)), b text);
-create foreign table remp (a int check (a in (1)), b text) server loopback options (table_name 'loct');
-create table locp (a int check (a in (2)), b text);
+create table utrtest (a int, b text collate "C") partition by list (a);
+create table loct (a int check (a in (1)), b text collate "C");
+create foreign table remp (a int check (a in (1)), b text collate "C") server loopback options (table_name 'loct');
+create table locp (a int check (a in (2)), b text collate "C");
 alter table utrtest attach partition remp for values in (1);
 alter table utrtest attach partition locp for values in (2);
 insert into utrtest values (1, 'foo');
@@ -7942,7 +7949,7 @@ alter table utrtest detach partition remp;
 drop foreign table remp;
 alter table loct drop constraint loct_a_check;
 alter table loct add check (a in (3));
-create foreign table remp (a int check (a in (3)), b text) server loopback options (table_name 'loct');
+create foreign table remp (a int check (a in (3)), b text collate "C") server loopback options (table_name 'loct');
 alter table utrtest attach partition remp for values in (3);
 insert into utrtest values (2, 'qux');
 insert into utrtest values (3, 'xyzzy');
@@ -8203,11 +8210,11 @@ drop table loc3;
 -- test IMPORT FOREIGN SCHEMA
 -- ===================================================================
 CREATE SCHEMA import_source;
-CREATE TABLE import_source.t1 (c1 int, c2 varchar NOT NULL);
-CREATE TABLE import_source.t2 (c1 int default 42, c2 varchar NULL, c3 text collate "POSIX");
-CREATE TYPE typ1 AS (m1 int, m2 varchar);
+CREATE TABLE import_source.t1 (c1 int, c2 varchar NOT NULL COLLATE "C");
+CREATE TABLE import_source.t2 (c1 int default 42, c2 varchar NULL COLLATE "C", c3 text collate "POSIX");
+CREATE TYPE typ1 AS (m1 int, m2 varchar COLLATE "C");
 CREATE TABLE import_source.t3 (c1 timestamptz default now(), c2 typ1);
-CREATE TABLE import_source."x 4" (c1 float8, "C 2" text, c3 varchar(42));
+CREATE TABLE import_source."x 4" (c1 float8, "C 2" text COLLATE "C", c3 varchar(42) COLLATE "C");
 CREATE TABLE import_source."x 5" (c1 float8);
 ALTER TABLE import_source."x 5" DROP COLUMN c1;
 CREATE TABLE import_source.t4 (c1 int) PARTITION BY RANGE (c1);
@@ -8232,7 +8239,7 @@ IMPORT FOREIGN SCHEMA import_source FROM SERVER loopback INTO import_dest1;
  Column |       Type        | Collation | Nullable | Default |    FDW options
 --------+-------------------+-----------+----------+---------+--------------------
  c1     | integer           |           |          |         | (column_name 'c1')
- c2     | character varying |           | not null |         | (column_name 'c2')
+ c2     | character varying | C         | not null |         | (column_name 'c2')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 't1')

@@ -8240,7 +8247,7 @@ FDW options: (schema_name 'import_source', table_name 't1')
  Column |       Type        | Collation | Nullable | Default |    FDW options
 --------+-------------------+-----------+----------+---------+--------------------
  c1     | integer           |           |          |         | (column_name 'c1')
- c2     | character varying |           |          |         | (column_name 'c2')
+ c2     | character varying | C         |          |         | (column_name 'c2')
  c3     | text              | POSIX     |          |         | (column_name 'c3')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 't2')
@@ -8264,8 +8271,8 @@ FDW options: (schema_name 'import_source', table_name 't4')
  Column |         Type          | Collation | Nullable | Default |     FDW options
 --------+-----------------------+-----------+----------+---------+---------------------
  c1     | double precision      |           |          |         | (column_name 'c1')
- C 2    | text                  |           |          |         | (column_name 'C 2')
- c3     | character varying(42) |           |          |         | (column_name 'c3')
+ C 2    | text                  | C         |          |         | (column_name 'C 2')
+ c3     | character varying(42) | C         |          |         | (column_name 'c3')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 'x 4')

@@ -8296,7 +8303,7 @@ IMPORT FOREIGN SCHEMA import_source FROM SERVER loopback INTO import_dest2
  Column |       Type        | Collation | Nullable | Default |    FDW options
 --------+-------------------+-----------+----------+---------+--------------------
  c1     | integer           |           |          |         | (column_name 'c1')
- c2     | character varying |           | not null |         | (column_name 'c2')
+ c2     | character varying | C         | not null |         | (column_name 'c2')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 't1')

@@ -8304,7 +8311,7 @@ FDW options: (schema_name 'import_source', table_name 't1')
  Column |       Type        | Collation | Nullable | Default |    FDW options
 --------+-------------------+-----------+----------+---------+--------------------
  c1     | integer           |           |          | 42      | (column_name 'c1')
- c2     | character varying |           |          |         | (column_name 'c2')
+ c2     | character varying | C         |          |         | (column_name 'c2')
  c3     | text              | POSIX     |          |         | (column_name 'c3')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 't2')
@@ -8328,8 +8335,8 @@ FDW options: (schema_name 'import_source', table_name 't4')
  Column |         Type          | Collation | Nullable | Default |     FDW options
 --------+-----------------------+-----------+----------+---------+---------------------
  c1     | double precision      |           |          |         | (column_name 'c1')
- C 2    | text                  |           |          |         | (column_name 'C 2')
- c3     | character varying(42) |           |          |         | (column_name 'c3')
+ C 2    | text                  | C         |          |         | (column_name 'C 2')
+ c3     | character varying(42) | C         |          |         | (column_name 'c3')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 'x 4')

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..6e1b391808 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "catalog/pg_collation.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -4802,44 +4803,54 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
          * include a schema name for types/functions in other schemas, which
          * is what we want.
          */
+        appendStringInfoString(&buf,
+                               "SELECT relname, "
+                               "  attname, "
+                               "  format_type(atttypid, atttypmod), "
+                               "  attnotnull, "
+                               "  pg_get_expr(adbin, adrelid), ");
+        if (import_collate)
+            appendStringInfo(&buf,
+                             "  CASE WHEN coll.oid = '%u' THEN"
+                             " defcoll.collname ELSE coll.collname END, "
+                             "  CASE WHEN coll.oid = '%u' THEN"
+                             " defcoll.nspname ELSE collnsp.nspname END ",
+                             DEFAULT_COLLATION_OID,
+                             DEFAULT_COLLATION_OID);
+        else
+            appendStringInfoString(&buf, "  NULL, NULL ");
+        appendStringInfoString(&buf,
+                               "FROM pg_class c "
+                               "  JOIN pg_namespace n ON "
+                               "    relnamespace = n.oid "
+                               "  LEFT JOIN pg_attribute a ON "
+                               "    attrelid = c.oid AND attnum > 0 "
+                               "      AND NOT attisdropped "
+                               "  LEFT JOIN pg_attrdef ad ON "
+                               "    adrelid = c.oid AND adnum = attnum ");
         if (import_collate)
+        {
             appendStringInfoString(&buf,
-                                   "SELECT relname, "
-                                   "  attname, "
-                                   "  format_type(atttypid, atttypmod), "
-                                   "  attnotnull, "
-                                   "  pg_get_expr(adbin, adrelid), "
-                                   "  collname, "
-                                   "  collnsp.nspname "
-                                   "FROM pg_class c "
-                                   "  JOIN pg_namespace n ON "
-                                   "    relnamespace = n.oid "
-                                   "  LEFT JOIN pg_attribute a ON "
-                                   "    attrelid = c.oid AND attnum > 0 "
-                                   "      AND NOT attisdropped "
-                                   "  LEFT JOIN pg_attrdef ad ON "
-                                   "    adrelid = c.oid AND adnum = attnum "
                                    "  LEFT JOIN pg_collation coll ON "
                                    "    coll.oid = attcollation "
                                    "  LEFT JOIN pg_namespace collnsp ON "
-                                   "    collnsp.oid = collnamespace ");
-        else
+                                   "    collnsp.oid = collnamespace "
+                                   "  LEFT JOIN ("
+                                   " SELECT cd.collname, nd.nspname FROM"
+                                   " pg_collation cd, pg_namespace nd, pg_database d"
+                                   " WHERE nd.oid = cd.collnamespace AND"
+                                   " d.datname = current_database() AND");
+            /* collprovider is new as of v10 */
+            if (PQserverVersion(conn) >= 100000)
+                appendStringInfoString(&buf,
+                                       " cd.collprovider = 'c' AND");
             appendStringInfoString(&buf,
-                                   "SELECT relname, "
-                                   "  attname, "
-                                   "  format_type(atttypid, atttypmod), "
-                                   "  attnotnull, "
-                                   "  pg_get_expr(adbin, adrelid), "
-                                   "  NULL, NULL "
-                                   "FROM pg_class c "
-                                   "  JOIN pg_namespace n ON "
-                                   "    relnamespace = n.oid "
-                                   "  LEFT JOIN pg_attribute a ON "
-                                   "    attrelid = c.oid AND attnum > 0 "
-                                   "      AND NOT attisdropped "
-                                   "  LEFT JOIN pg_attrdef ad ON "
-                                   "    adrelid = c.oid AND adnum = attnum ");
-
+                                   " cd.collcollate = d.datcollate AND"
+                                   " cd.collctype = d.datctype AND"
+                                   " cd.collencoding IN (d.encoding, -1)"
+                                   " ORDER BY length(cd.collname) LIMIT 1 )"
+                                   " defcoll ON TRUE ");
+        }
         appendStringInfoString(&buf,
                                "WHERE c.relkind IN ("
                                CppAsString2(RELKIND_RELATION) ","
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..8454f7ae27 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -31,29 +31,29 @@ CREATE SCHEMA "S 1";
 CREATE TABLE "S 1"."T 1" (
     "C 1" int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     c4 timestamptz,
     c5 timestamp,
-    c6 varchar(10),
-    c7 char(10),
+    c6 varchar(10) collate "C",
+    c7 char(10) collate "C",
     c8 user_enum,
     CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
 );
 CREATE TABLE "S 1"."T 2" (
     c1 int NOT NULL,
-    c2 text,
+    c2 text collate "C",
     CONSTRAINT t2_pkey PRIMARY KEY (c1)
 );
 CREATE TABLE "S 1"."T 3" (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     CONSTRAINT t3_pkey PRIMARY KEY (c1)
 );
 CREATE TABLE "S 1"."T 4" (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );

@@ -97,16 +97,18 @@ ANALYZE "S 1"."T 4";

 -- ===================================================================
 -- create foreign tables
+-- Note: to ensure stable regression results, all collatable columns
+-- in these tables must have explicitly-specified collations.
 -- ===================================================================
 CREATE FOREIGN TABLE ft1 (
     c0 int,
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text,
+    c3 text collate "C",
     c4 timestamptz,
     c5 timestamp,
-    c6 varchar(10),
-    c7 char(10) default 'ft1',
+    c6 varchar(10) collate "C",
+    c7 char(10) default 'ft1' collate "C",
     c8 user_enum
 ) SERVER loopback;
 ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
@@ -115,11 +117,11 @@ CREATE FOREIGN TABLE ft2 (
     c1 int NOT NULL,
     c2 int NOT NULL,
     cx int,
-    c3 text,
+    c3 text collate "C",
     c4 timestamptz,
     c5 timestamp,
-    c6 varchar(10),
-    c7 char(10) default 'ft2',
+    c6 varchar(10) collate "C",
+    c7 char(10) default 'ft2' collate "C",
     c8 user_enum
 ) SERVER loopback;
 ALTER FOREIGN TABLE ft2 DROP COLUMN cx;
@@ -127,19 +129,19 @@ ALTER FOREIGN TABLE ft2 DROP COLUMN cx;
 CREATE FOREIGN TABLE ft4 (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text
+    c3 text collate "C"
 ) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 3');

 CREATE FOREIGN TABLE ft5 (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text
+    c3 text collate "C"
 ) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 4');

 CREATE FOREIGN TABLE ft6 (
     c1 int NOT NULL,
     c2 int NOT NULL,
-    c3 text
+    c3 text collate "C"
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');

 -- ===================================================================
@@ -332,7 +334,7 @@ SELECT * FROM ft2 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft1 WHERE c1 < 5));
 EXPLAIN (VERBOSE, COSTS OFF)
     SELECT * FROM ft2 ORDER BY ft2.c1, random();
 EXPLAIN (VERBOSE, COSTS OFF)
-    SELECT * FROM ft2 ORDER BY ft2.c1, ft2.c3 collate "C";
+    SELECT * FROM ft2 ORDER BY ft2.c1, ft2.c3 collate "POSIX";

 -- user-defined operator/function
 CREATE FUNCTION postgres_fdw_abs(int) RETURNS int AS $$
@@ -1119,15 +1121,15 @@ create foreign table ft3 (f1 text collate "C", f2 text, f3 varchar(10))
 -- can be sent to remote
 explain (verbose, costs off) select * from ft3 where f1 = 'foo';
 explain (verbose, costs off) select * from ft3 where f1 COLLATE "C" = 'foo';
-explain (verbose, costs off) select * from ft3 where f2 = 'foo';
-explain (verbose, costs off) select * from ft3 where f3 = 'foo';
-explain (verbose, costs off) select * from ft3 f, loct3 l
-  where f.f3 = l.f3 and l.f1 = 'foo';
+explain (verbose, costs off) select * from ft3 where f1 = 'foo' COLLATE "C";
 -- can't be sent to remote
 explain (verbose, costs off) select * from ft3 where f1 COLLATE "POSIX" = 'foo';
-explain (verbose, costs off) select * from ft3 where f1 = 'foo' COLLATE "C";
+explain (verbose, costs off) select * from ft3 where f2 = 'foo';
 explain (verbose, costs off) select * from ft3 where f2 COLLATE "C" = 'foo';
 explain (verbose, costs off) select * from ft3 where f2 = 'foo' COLLATE "C";
+explain (verbose, costs off) select * from ft3 where f3 = 'foo';
+explain (verbose, costs off) select * from ft3 f, loct3 l
+  where f.f3 = l.f3 and l.f1 = 'foo';
 explain (verbose, costs off) select * from ft3 f, loct3 l
   where f.f3 = l.f3 COLLATE "POSIX" and l.f1 = 'foo';

@@ -1931,12 +1933,12 @@ drop table loct1;
 drop table loct2;

 -- Test pushing down UPDATE/DELETE joins to the remote server
-create table parent (a int, b text);
-create table loct1 (a int, b text);
-create table loct2 (a int, b text);
-create foreign table remt1 (a int, b text)
+create table parent (a int, b text collate "C");
+create table loct1 (a int, b text collate "C");
+create table loct2 (a int, b text collate "C");
+create foreign table remt1 (a int, b text collate "C")
   server loopback options (table_name 'loct1');
-create foreign table remt2 (a int, b text)
+create foreign table remt2 (a int, b text collate "C")
   server loopback options (table_name 'loct2');
 alter foreign table remt1 inherit parent;

@@ -2029,10 +2031,10 @@ drop table loct1;
 drop table loct2;

 -- Test update tuple routing
-create table utrtest (a int, b text) partition by list (a);
-create table loct (a int check (a in (1)), b text);
-create foreign table remp (a int check (a in (1)), b text) server loopback options (table_name 'loct');
-create table locp (a int check (a in (2)), b text);
+create table utrtest (a int, b text collate "C") partition by list (a);
+create table loct (a int check (a in (1)), b text collate "C");
+create foreign table remp (a int check (a in (1)), b text collate "C") server loopback options (table_name 'loct');
+create table locp (a int check (a in (2)), b text collate "C");
 alter table utrtest attach partition remp for values in (1);
 alter table utrtest attach partition locp for values in (2);

@@ -2109,7 +2111,7 @@ alter table utrtest detach partition remp;
 drop foreign table remp;
 alter table loct drop constraint loct_a_check;
 alter table loct add check (a in (3));
-create foreign table remp (a int check (a in (3)), b text) server loopback options (table_name 'loct');
+create foreign table remp (a int check (a in (3)), b text collate "C") server loopback options (table_name 'loct');
 alter table utrtest attach partition remp for values in (3);
 insert into utrtest values (2, 'qux');
 insert into utrtest values (3, 'xyzzy');
@@ -2312,11 +2314,11 @@ drop table loc3;
 -- ===================================================================

 CREATE SCHEMA import_source;
-CREATE TABLE import_source.t1 (c1 int, c2 varchar NOT NULL);
-CREATE TABLE import_source.t2 (c1 int default 42, c2 varchar NULL, c3 text collate "POSIX");
-CREATE TYPE typ1 AS (m1 int, m2 varchar);
+CREATE TABLE import_source.t1 (c1 int, c2 varchar NOT NULL COLLATE "C");
+CREATE TABLE import_source.t2 (c1 int default 42, c2 varchar NULL COLLATE "C", c3 text collate "POSIX");
+CREATE TYPE typ1 AS (m1 int, m2 varchar COLLATE "C");
 CREATE TABLE import_source.t3 (c1 timestamptz default now(), c2 typ1);
-CREATE TABLE import_source."x 4" (c1 float8, "C 2" text, c3 varchar(42));
+CREATE TABLE import_source."x 4" (c1 float8, "C 2" text COLLATE "C", c3 varchar(42) COLLATE "C");
 CREATE TABLE import_source."x 5" (c1 float8);
 ALTER TABLE import_source."x 5" DROP COLUMN c1;
 CREATE TABLE import_source.t4 (c1 int) PARTITION BY RANGE (c1);
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index eab2cc9378..50601fb208 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -89,6 +89,12 @@
   need, surprising semantic anomalies may arise when types or collations do
   not match, due to the remote server interpreting <literal>WHERE</literal> clauses
   slightly differently from the local server.
+  Also, while you need not apply any explicit <literal>COLLATE</literal>
+  clause to a collatable column of a foreign table, not doing so will cause
+  the planner to assume that the remote column's collation does not match any
+  locally available collation.  This will generally result in not sending
+  <literal>WHERE</literal> clauses involving such a column to the remote
+  server, which may greatly degrade query performance.
  </para>

  <para>
@@ -423,6 +429,10 @@ OPTIONS (ADD password_required 'false');
        need to turn this off if the remote server has a different set of
        collation names than the local server does, which is likely to be the
        case if it's running on a different operating system.
+       (Note, however, that simply turning this off may result in
+       non-optimal plans for queries on the imported tables.  It's best to
+       find a way to label collatable columns with collations that match the
+       remote server's behavior.)
       </para>
      </listitem>
     </varlistentry>

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal: enhancing plpgsql debug API - returns text value of variable content
Next
From: Alvaro Herrera
Date:
Subject: Re: Print logical WAL message content