Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails - Mailing list pgsql-hackers

From Tom Lane
Subject Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Date
Msg-id 2438715.1632510693@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Responses Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
List pgsql-hackers
Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> One thing I noticed is that collatable operators/functions sent to the
> remote might also cause an unexpected result when the default
> collations are not compatible.  Consider this example (even with your
> patch):
> ...
> where ft1 is a foreign table with an integer column c1.  As shown
> above, the sort using the collatable function chr() is performed
> remotely, so the select query might produce the result in an
> unexpected sort order when the default collations are not compatible.

I don't think there's anything really new there --- it's still assuming
that COLLATE "default" means the same locally and remotely.

As a short-term answer, I propose that we apply (and back-patch) the
attached documentation changes.

Longer-term, it seems like we really have to be able to represent
the notion of a remote column that has an "unknown" collation (that
is, one that doesn't match any local collation, or at least is not
known to do so).  My previous patch essentially makes "default" act
that way, but conflating "unknown" with "default" has too many
downsides.  A rough sketch for making this happen is:

1. Create a built-in "unknown" entry in pg_collation.  Insert some
hack or other to prevent this from being applied to any real, local
column; but allow foreign-table columns to have it.

2. Apply mods, probably fairly similar to my patch, that prevent
postgres_fdw from believing that "unknown" matches any local
collation.  (Hm, actually maybe no special code change will be
needed here, once "unknown" has its own OID?)

3. Change postgresImportForeignSchema so that it can substitute
the "unknown" collation at need.  The exact rules for this could
be debated depending on whether you'd rather prioritize safety or
ease-of-use, but I think at least we should use "unknown" whenever
import_collate is turned off.  Perhaps there should be an option
to substitute it for remote "default" as well.  (Further down the
road, perhaps that could be generalized to allow a user-controlled
mapping from remote to local collations.)

Anyway, I think I should withdraw the upthread patch; we don't
want to go that way.

            regards, tom lane

diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index bf95da9721..dbc11694a0 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -95,8 +95,8 @@
   referenced columns of the remote table.  Although <filename>postgres_fdw</filename>
   is currently rather forgiving about performing data type conversions at
   need, surprising semantic anomalies may arise when types or collations do
-  not match, due to the remote server interpreting <literal>WHERE</literal> clauses
-  slightly differently from the local server.
+  not match, due to the remote server interpreting query conditions
+  differently from the local server.
  </para>

  <para>
@@ -537,6 +537,17 @@ OPTIONS (ADD password_required 'false');
        need to turn this off if the remote server has a different set of
        collation names than the local server does, which is likely to be the
        case if it's running on a different operating system.
+       If you do so, however, there is a very severe risk that the imported
+       table columns' collations will not match the underlying data, resulting
+       in anomalous query behavior.
+      </para>
+
+      <para>
+       Even when this parameter is set to <literal>true</literal>, importing
+       columns whose collation is the remote server's default can be risky.
+       They will be imported with <literal>COLLATE "default"</literal>, which
+       will select the local server's default collation, which could be
+       different.
       </para>
      </listitem>
     </varlistentry>

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: decoupling table and index vacuum
Next
From: Tomas Vondra
Date:
Subject: Re: logical decoding and replication of sequences