Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Date
Msg-id CAPmGK16g4qDfCqL7hAAdBpkCjz62gA1FSBpKtH=Bfnm6L=qkfA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Sep 25, 2021 at 4:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> > One thing I noticed is that collatable operators/functions sent to the
> > remote might also cause an unexpected result when the default
> > collations are not compatible.  Consider this example (even with your
> > patch):
> > ...
> > where ft1 is a foreign table with an integer column c1.  As shown
> > above, the sort using the collatable function chr() is performed
> > remotely, so the select query might produce the result in an
> > unexpected sort order when the default collations are not compatible.
>
> I don't think there's anything really new there --- it's still assuming
> that COLLATE "default" means the same locally and remotely.

I thought that the example showed that we would need to specify a
collation per-operation, not only per-foreign-table-column, like
“ORDER BY chr(c1) COLLATE “foo”” where “foo” is the actual name of a
local collation matching the local server’s default collation, when
the default collation doesn’t match the remote server’s default
collation, to avoid pushing down operations incorrectly as in the
example.

> As a short-term answer, I propose that we apply (and back-patch) the
> attached documentation changes.

The attached patch looks good to me.

> Longer-term, it seems like we really have to be able to represent
> the notion of a remote column that has an "unknown" collation (that
> is, one that doesn't match any local collation, or at least is not
> known to do so).

+1

> A rough sketch for making this happen is:
>
> 1. Create a built-in "unknown" entry in pg_collation.  Insert some
> hack or other to prevent this from being applied to any real, local
> column; but allow foreign-table columns to have it.
>
> 2. Apply mods, probably fairly similar to my patch, that prevent
> postgres_fdw from believing that "unknown" matches any local
> collation.  (Hm, actually maybe no special code change will be
> needed here, once "unknown" has its own OID?)
>
> 3. Change postgresImportForeignSchema so that it can substitute
> the "unknown" collation at need.  The exact rules for this could
> be debated depending on whether you'd rather prioritize safety or
> ease-of-use, but I think at least we should use "unknown" whenever
> import_collate is turned off.  Perhaps there should be an option
> to substitute it for remote "default" as well.  (Further down the
> road, perhaps that could be generalized to allow a user-controlled
> mapping from remote to local collations.)

In addition, a) we should detect whether local “default” matches
remote “default”, and b) if not, we should prevent pushing down
sort/comparison operations using collatable functions/operators like
“ORDER BY chr(c1)” in the example (and pushing down those operations
on foreign-table columns labeled with “COLLATE default” if such
labeling is allowed)?

Best regards,
Etsuro Fujita



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: OpenSSL 3.0.0 compatibility
Next
From: Tom Lane
Date:
Subject: Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails