Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Date
Msg-id 1156217.1597703867@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
I wrote:
> So I think what is happening here is that postgres_fdw's version of
> IMPORT FOREIGN SCHEMA translates "COLLATE default" on the remote
> server to "COLLATE default" on the local one, which of course is
> a big fail if the defaults don't match.  That allows the local
> planner to believe that remote ORDER BYs on the two foreign tables
> will give compatible results, causing the merge join to not work
> very well at all.

> We probably need to figure out some way of substituting the remote
> database's actual lc_collate setting when we see "COLLATE default".

Here's a draft patch for that part.  There's a few things to quibble
about:

* It tests for "COLLATE default" by checking whether pg_collation.oid
is DEFAULT_COLLATION_OID, thus assuming that that OID will never change.
I think this is safer than checking the collation name, but maybe
somebody else would have a different opinion?  Another idea is to check
whether collprovider is 'd', but that only works with v10 and up.

* It might not be able to find a remote collation matching the database's
datcollate/datctype.  As coded, we'll end up creating the local column
with "COLLATE default", putting us back in the same hurt we're in now.
I think this is okay given the other planned change to interpret "COLLATE
default" as "we don't know what collation this is".  In any case it's hard
to see what else we could do, other than fail entirely.

* Alternatively, it might find more than one such remote collation;
indeed that's the norm, eg we'd typically find both "en_US" and
"en_US.utf8", or the like.  I made it choose the shortest collation
name in such cases, but maybe there is a case for the longest?
I don't much want it to pick "ucs_basic" over "C", though.

* The whole thing is certain to fall over whenever we find a way to
allow ICU collations as database defaults.  While we can presumably
fix the query when we make that change, existing postgres_fdw releases
would not work against a newer server.  Probably there's little to be
done about this, either.

* As shown by the expected-output changes, there are some test cases
that expose that we're not picking the default collation anymore.
That creates a testing problem: this can't be committed as-is because
it'll fail with any other locale environment than what the expected
file was made with.  We could lobotomize the test cases to not print
the column collation, but then we're not really verifying that this
code does what it's supposed to.  Not sure what the best compromise is.

Comments?

            regards, tom lane

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..f34178c5d3 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8232,7 +8232,7 @@ IMPORT FOREIGN SCHEMA import_source FROM SERVER loopback INTO import_dest1;
  Column |       Type        | Collation | Nullable | Default |    FDW options
 --------+-------------------+-----------+----------+---------+--------------------
  c1     | integer           |           |          |         | (column_name 'c1')
- c2     | character varying |           | not null |         | (column_name 'c2')
+ c2     | character varying | C         | not null |         | (column_name 'c2')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 't1')

@@ -8240,7 +8240,7 @@ FDW options: (schema_name 'import_source', table_name 't1')
  Column |       Type        | Collation | Nullable | Default |    FDW options
 --------+-------------------+-----------+----------+---------+--------------------
  c1     | integer           |           |          |         | (column_name 'c1')
- c2     | character varying |           |          |         | (column_name 'c2')
+ c2     | character varying | C         |          |         | (column_name 'c2')
  c3     | text              | POSIX     |          |         | (column_name 'c3')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 't2')
@@ -8264,8 +8264,8 @@ FDW options: (schema_name 'import_source', table_name 't4')
  Column |         Type          | Collation | Nullable | Default |     FDW options
 --------+-----------------------+-----------+----------+---------+---------------------
  c1     | double precision      |           |          |         | (column_name 'c1')
- C 2    | text                  |           |          |         | (column_name 'C 2')
- c3     | character varying(42) |           |          |         | (column_name 'c3')
+ C 2    | text                  | C         |          |         | (column_name 'C 2')
+ c3     | character varying(42) | C         |          |         | (column_name 'c3')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 'x 4')

@@ -8296,7 +8296,7 @@ IMPORT FOREIGN SCHEMA import_source FROM SERVER loopback INTO import_dest2
  Column |       Type        | Collation | Nullable | Default |    FDW options
 --------+-------------------+-----------+----------+---------+--------------------
  c1     | integer           |           |          |         | (column_name 'c1')
- c2     | character varying |           | not null |         | (column_name 'c2')
+ c2     | character varying | C         | not null |         | (column_name 'c2')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 't1')

@@ -8304,7 +8304,7 @@ FDW options: (schema_name 'import_source', table_name 't1')
  Column |       Type        | Collation | Nullable | Default |    FDW options
 --------+-------------------+-----------+----------+---------+--------------------
  c1     | integer           |           |          | 42      | (column_name 'c1')
- c2     | character varying |           |          |         | (column_name 'c2')
+ c2     | character varying | C         |          |         | (column_name 'c2')
  c3     | text              | POSIX     |          |         | (column_name 'c3')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 't2')
@@ -8328,8 +8328,8 @@ FDW options: (schema_name 'import_source', table_name 't4')
  Column |         Type          | Collation | Nullable | Default |     FDW options
 --------+-----------------------+-----------+----------+---------+---------------------
  c1     | double precision      |           |          |         | (column_name 'c1')
- C 2    | text                  |           |          |         | (column_name 'C 2')
- c3     | character varying(42) |           |          |         | (column_name 'c3')
+ C 2    | text                  | C         |          |         | (column_name 'C 2')
+ c3     | character varying(42) | C         |          |         | (column_name 'c3')
 Server: loopback
 FDW options: (schema_name 'import_source', table_name 'x 4')

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..296bedd5de 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "catalog/pg_collation.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -4802,44 +4803,54 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
          * include a schema name for types/functions in other schemas, which
          * is what we want.
          */
+        appendStringInfoString(&buf,
+                               "SELECT relname, "
+                               "  attname, "
+                               "  format_type(atttypid, atttypmod), "
+                               "  attnotnull, "
+                               "  pg_get_expr(adbin, adrelid), ");
+        if (import_collate)
+            appendStringInfo(&buf,
+                             "  CASE WHEN coll.oid = %u THEN"
+                             " defcoll.collname ELSE coll.collname END, "
+                             "  CASE WHEN coll.oid = %u THEN"
+                             " defcoll.nspname ELSE collnsp.nspname END ",
+                             DEFAULT_COLLATION_OID,
+                             DEFAULT_COLLATION_OID);
+        else
+            appendStringInfoString(&buf, "  NULL, NULL ");
+        appendStringInfoString(&buf,
+                               "FROM pg_class c "
+                               "  JOIN pg_namespace n ON "
+                               "    relnamespace = n.oid "
+                               "  LEFT JOIN pg_attribute a ON "
+                               "    attrelid = c.oid AND attnum > 0 "
+                               "      AND NOT attisdropped "
+                               "  LEFT JOIN pg_attrdef ad ON "
+                               "    adrelid = c.oid AND adnum = attnum ");
         if (import_collate)
+        {
             appendStringInfoString(&buf,
-                                   "SELECT relname, "
-                                   "  attname, "
-                                   "  format_type(atttypid, atttypmod), "
-                                   "  attnotnull, "
-                                   "  pg_get_expr(adbin, adrelid), "
-                                   "  collname, "
-                                   "  collnsp.nspname "
-                                   "FROM pg_class c "
-                                   "  JOIN pg_namespace n ON "
-                                   "    relnamespace = n.oid "
-                                   "  LEFT JOIN pg_attribute a ON "
-                                   "    attrelid = c.oid AND attnum > 0 "
-                                   "      AND NOT attisdropped "
-                                   "  LEFT JOIN pg_attrdef ad ON "
-                                   "    adrelid = c.oid AND adnum = attnum "
                                    "  LEFT JOIN pg_collation coll ON "
                                    "    coll.oid = attcollation "
                                    "  LEFT JOIN pg_namespace collnsp ON "
-                                   "    collnsp.oid = collnamespace ");
-        else
+                                   "    collnsp.oid = collnamespace "
+                                   "  LEFT JOIN ("
+                                   " SELECT cd.collname, nd.nspname FROM"
+                                   " pg_collation cd, pg_namespace nd, pg_database d"
+                                   " WHERE nd.oid = cd.collnamespace AND"
+                                   " d.datname = current_database() AND");
+            /* collprovider is new as of v10 */
+            if (PQserverVersion(conn) >= 100000)
+                appendStringInfoString(&buf,
+                                       " cd.collprovider = 'c' AND");
             appendStringInfoString(&buf,
-                                   "SELECT relname, "
-                                   "  attname, "
-                                   "  format_type(atttypid, atttypmod), "
-                                   "  attnotnull, "
-                                   "  pg_get_expr(adbin, adrelid), "
-                                   "  NULL, NULL "
-                                   "FROM pg_class c "
-                                   "  JOIN pg_namespace n ON "
-                                   "    relnamespace = n.oid "
-                                   "  LEFT JOIN pg_attribute a ON "
-                                   "    attrelid = c.oid AND attnum > 0 "
-                                   "      AND NOT attisdropped "
-                                   "  LEFT JOIN pg_attrdef ad ON "
-                                   "    adrelid = c.oid AND adnum = attnum ");
-
+                                   " cd.collcollate = d.datcollate AND"
+                                   " cd.collctype = d.datctype AND"
+                                   " cd.collencoding IN (d.encoding, -1)"
+                                   " ORDER BY length(cd.collname) LIMIT 1 )"
+                                   " defcoll ON TRUE ");
+        }
         appendStringInfoString(&buf,
                                "WHERE c.relkind IN ("
                                CppAsString2(RELKIND_RELATION) ","

pgsql-bugs by date:

Previous
From: Cyril Jouve
Date:
Subject: Re: BUG #16508: using multi-host connection string when the first host is starting fails
Next
From: PG Bug reporting form
Date:
Subject: BUG #16584: could not create relation-cache initialization file "global/pg_internal.init.2002": No space left