Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails |
Date | |
Msg-id | 2438715.1632510693@sss.pgh.pa.us Whole thread Raw |
In response to | Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails (Etsuro Fujita <etsuro.fujita@gmail.com>) |
Responses |
Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
|
List | pgsql-hackers |
Etsuro Fujita <etsuro.fujita@gmail.com> writes: > One thing I noticed is that collatable operators/functions sent to the > remote might also cause an unexpected result when the default > collations are not compatible. Consider this example (even with your > patch): > ... > where ft1 is a foreign table with an integer column c1. As shown > above, the sort using the collatable function chr() is performed > remotely, so the select query might produce the result in an > unexpected sort order when the default collations are not compatible. I don't think there's anything really new there --- it's still assuming that COLLATE "default" means the same locally and remotely. As a short-term answer, I propose that we apply (and back-patch) the attached documentation changes. Longer-term, it seems like we really have to be able to represent the notion of a remote column that has an "unknown" collation (that is, one that doesn't match any local collation, or at least is not known to do so). My previous patch essentially makes "default" act that way, but conflating "unknown" with "default" has too many downsides. A rough sketch for making this happen is: 1. Create a built-in "unknown" entry in pg_collation. Insert some hack or other to prevent this from being applied to any real, local column; but allow foreign-table columns to have it. 2. Apply mods, probably fairly similar to my patch, that prevent postgres_fdw from believing that "unknown" matches any local collation. (Hm, actually maybe no special code change will be needed here, once "unknown" has its own OID?) 3. Change postgresImportForeignSchema so that it can substitute the "unknown" collation at need. The exact rules for this could be debated depending on whether you'd rather prioritize safety or ease-of-use, but I think at least we should use "unknown" whenever import_collate is turned off. Perhaps there should be an option to substitute it for remote "default" as well. (Further down the road, perhaps that could be generalized to allow a user-controlled mapping from remote to local collations.) Anyway, I think I should withdraw the upthread patch; we don't want to go that way. regards, tom lane diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml index bf95da9721..dbc11694a0 100644 --- a/doc/src/sgml/postgres-fdw.sgml +++ b/doc/src/sgml/postgres-fdw.sgml @@ -95,8 +95,8 @@ referenced columns of the remote table. Although <filename>postgres_fdw</filename> is currently rather forgiving about performing data type conversions at need, surprising semantic anomalies may arise when types or collations do - not match, due to the remote server interpreting <literal>WHERE</literal> clauses - slightly differently from the local server. + not match, due to the remote server interpreting query conditions + differently from the local server. </para> <para> @@ -537,6 +537,17 @@ OPTIONS (ADD password_required 'false'); need to turn this off if the remote server has a different set of collation names than the local server does, which is likely to be the case if it's running on a different operating system. + If you do so, however, there is a very severe risk that the imported + table columns' collations will not match the underlying data, resulting + in anomalous query behavior. + </para> + + <para> + Even when this parameter is set to <literal>true</literal>, importing + columns whose collation is the remote server's default can be risky. + They will be imported with <literal>COLLATE "default"</literal>, which + will select the local server's default collation, which could be + different. </para> </listitem> </varlistentry>
pgsql-hackers by date: