Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails |
Date | |
Msg-id | 1156217.1597703867@sss.pgh.pa.us Whole thread Raw |
In response to | Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-bugs |
I wrote: > So I think what is happening here is that postgres_fdw's version of > IMPORT FOREIGN SCHEMA translates "COLLATE default" on the remote > server to "COLLATE default" on the local one, which of course is > a big fail if the defaults don't match. That allows the local > planner to believe that remote ORDER BYs on the two foreign tables > will give compatible results, causing the merge join to not work > very well at all. > We probably need to figure out some way of substituting the remote > database's actual lc_collate setting when we see "COLLATE default". Here's a draft patch for that part. There's a few things to quibble about: * It tests for "COLLATE default" by checking whether pg_collation.oid is DEFAULT_COLLATION_OID, thus assuming that that OID will never change. I think this is safer than checking the collation name, but maybe somebody else would have a different opinion? Another idea is to check whether collprovider is 'd', but that only works with v10 and up. * It might not be able to find a remote collation matching the database's datcollate/datctype. As coded, we'll end up creating the local column with "COLLATE default", putting us back in the same hurt we're in now. I think this is okay given the other planned change to interpret "COLLATE default" as "we don't know what collation this is". In any case it's hard to see what else we could do, other than fail entirely. * Alternatively, it might find more than one such remote collation; indeed that's the norm, eg we'd typically find both "en_US" and "en_US.utf8", or the like. I made it choose the shortest collation name in such cases, but maybe there is a case for the longest? I don't much want it to pick "ucs_basic" over "C", though. * The whole thing is certain to fall over whenever we find a way to allow ICU collations as database defaults. While we can presumably fix the query when we make that change, existing postgres_fdw releases would not work against a newer server. Probably there's little to be done about this, either. * As shown by the expected-output changes, there are some test cases that expose that we're not picking the default collation anymore. That creates a testing problem: this can't be committed as-is because it'll fail with any other locale environment than what the expected file was made with. We could lobotomize the test cases to not print the column collation, but then we're not really verifying that this code does what it's supposed to. Not sure what the best compromise is. Comments? regards, tom lane diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 90db550b92..f34178c5d3 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -8232,7 +8232,7 @@ IMPORT FOREIGN SCHEMA import_source FROM SERVER loopback INTO import_dest1; Column | Type | Collation | Nullable | Default | FDW options --------+-------------------+-----------+----------+---------+-------------------- c1 | integer | | | | (column_name 'c1') - c2 | character varying | | not null | | (column_name 'c2') + c2 | character varying | C | not null | | (column_name 'c2') Server: loopback FDW options: (schema_name 'import_source', table_name 't1') @@ -8240,7 +8240,7 @@ FDW options: (schema_name 'import_source', table_name 't1') Column | Type | Collation | Nullable | Default | FDW options --------+-------------------+-----------+----------+---------+-------------------- c1 | integer | | | | (column_name 'c1') - c2 | character varying | | | | (column_name 'c2') + c2 | character varying | C | | | (column_name 'c2') c3 | text | POSIX | | | (column_name 'c3') Server: loopback FDW options: (schema_name 'import_source', table_name 't2') @@ -8264,8 +8264,8 @@ FDW options: (schema_name 'import_source', table_name 't4') Column | Type | Collation | Nullable | Default | FDW options --------+-----------------------+-----------+----------+---------+--------------------- c1 | double precision | | | | (column_name 'c1') - C 2 | text | | | | (column_name 'C 2') - c3 | character varying(42) | | | | (column_name 'c3') + C 2 | text | C | | | (column_name 'C 2') + c3 | character varying(42) | C | | | (column_name 'c3') Server: loopback FDW options: (schema_name 'import_source', table_name 'x 4') @@ -8296,7 +8296,7 @@ IMPORT FOREIGN SCHEMA import_source FROM SERVER loopback INTO import_dest2 Column | Type | Collation | Nullable | Default | FDW options --------+-------------------+-----------+----------+---------+-------------------- c1 | integer | | | | (column_name 'c1') - c2 | character varying | | not null | | (column_name 'c2') + c2 | character varying | C | not null | | (column_name 'c2') Server: loopback FDW options: (schema_name 'import_source', table_name 't1') @@ -8304,7 +8304,7 @@ FDW options: (schema_name 'import_source', table_name 't1') Column | Type | Collation | Nullable | Default | FDW options --------+-------------------+-----------+----------+---------+-------------------- c1 | integer | | | 42 | (column_name 'c1') - c2 | character varying | | | | (column_name 'c2') + c2 | character varying | C | | | (column_name 'c2') c3 | text | POSIX | | | (column_name 'c3') Server: loopback FDW options: (schema_name 'import_source', table_name 't2') @@ -8328,8 +8328,8 @@ FDW options: (schema_name 'import_source', table_name 't4') Column | Type | Collation | Nullable | Default | FDW options --------+-----------------------+-----------+----------+---------+--------------------- c1 | double precision | | | | (column_name 'c1') - C 2 | text | | | | (column_name 'C 2') - c3 | character varying(42) | | | | (column_name 'c3') + C 2 | text | C | | | (column_name 'C 2') + c3 | character varying(42) | C | | | (column_name 'c3') Server: loopback FDW options: (schema_name 'import_source', table_name 'x 4') diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 9fc53cad68..296bedd5de 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -18,6 +18,7 @@ #include "access/sysattr.h" #include "access/table.h" #include "catalog/pg_class.h" +#include "catalog/pg_collation.h" #include "commands/defrem.h" #include "commands/explain.h" #include "commands/vacuum.h" @@ -4802,44 +4803,54 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid) * include a schema name for types/functions in other schemas, which * is what we want. */ + appendStringInfoString(&buf, + "SELECT relname, " + " attname, " + " format_type(atttypid, atttypmod), " + " attnotnull, " + " pg_get_expr(adbin, adrelid), "); + if (import_collate) + appendStringInfo(&buf, + " CASE WHEN coll.oid = %u THEN" + " defcoll.collname ELSE coll.collname END, " + " CASE WHEN coll.oid = %u THEN" + " defcoll.nspname ELSE collnsp.nspname END ", + DEFAULT_COLLATION_OID, + DEFAULT_COLLATION_OID); + else + appendStringInfoString(&buf, " NULL, NULL "); + appendStringInfoString(&buf, + "FROM pg_class c " + " JOIN pg_namespace n ON " + " relnamespace = n.oid " + " LEFT JOIN pg_attribute a ON " + " attrelid = c.oid AND attnum > 0 " + " AND NOT attisdropped " + " LEFT JOIN pg_attrdef ad ON " + " adrelid = c.oid AND adnum = attnum "); if (import_collate) + { appendStringInfoString(&buf, - "SELECT relname, " - " attname, " - " format_type(atttypid, atttypmod), " - " attnotnull, " - " pg_get_expr(adbin, adrelid), " - " collname, " - " collnsp.nspname " - "FROM pg_class c " - " JOIN pg_namespace n ON " - " relnamespace = n.oid " - " LEFT JOIN pg_attribute a ON " - " attrelid = c.oid AND attnum > 0 " - " AND NOT attisdropped " - " LEFT JOIN pg_attrdef ad ON " - " adrelid = c.oid AND adnum = attnum " " LEFT JOIN pg_collation coll ON " " coll.oid = attcollation " " LEFT JOIN pg_namespace collnsp ON " - " collnsp.oid = collnamespace "); - else + " collnsp.oid = collnamespace " + " LEFT JOIN (" + " SELECT cd.collname, nd.nspname FROM" + " pg_collation cd, pg_namespace nd, pg_database d" + " WHERE nd.oid = cd.collnamespace AND" + " d.datname = current_database() AND"); + /* collprovider is new as of v10 */ + if (PQserverVersion(conn) >= 100000) + appendStringInfoString(&buf, + " cd.collprovider = 'c' AND"); appendStringInfoString(&buf, - "SELECT relname, " - " attname, " - " format_type(atttypid, atttypmod), " - " attnotnull, " - " pg_get_expr(adbin, adrelid), " - " NULL, NULL " - "FROM pg_class c " - " JOIN pg_namespace n ON " - " relnamespace = n.oid " - " LEFT JOIN pg_attribute a ON " - " attrelid = c.oid AND attnum > 0 " - " AND NOT attisdropped " - " LEFT JOIN pg_attrdef ad ON " - " adrelid = c.oid AND adnum = attnum "); - + " cd.collcollate = d.datcollate AND" + " cd.collctype = d.datctype AND" + " cd.collencoding IN (d.encoding, -1)" + " ORDER BY length(cd.collname) LIMIT 1 )" + " defcoll ON TRUE "); + } appendStringInfoString(&buf, "WHERE c.relkind IN (" CppAsString2(RELKIND_RELATION) ","
pgsql-bugs by date: