Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Date
Msg-id 1072831.1597678008@sss.pgh.pa.us
Whole thread Raw
In response to BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
List pgsql-bugs
PG Bug reporting form <noreply@postgresql.org> writes:
> Joining two identical tables placed on separate DBs with different collation
> accessed through postgres_fdw failed when joined with merge join. Some
> records are missing (7 vs. 16 rows in example) in output. See this snippet
> https://gitlab.com/-/snippets/2004522 (or code pasted below) for psql script
> reproducing error also with expected output (working fine on alpine linux).

So I think what is happening here is that postgres_fdw's version of
IMPORT FOREIGN SCHEMA translates "COLLATE default" on the remote
server to "COLLATE default" on the local one, which of course is
a big fail if the defaults don't match.  That allows the local
planner to believe that remote ORDER BYs on the two foreign tables
will give compatible results, causing the merge join to not work
very well at all.

We probably need to figure out some way of substituting the remote
database's actual lc_collate setting when we see "COLLATE default".

I'm also thinking that the documentation is way too cavalier about
dismissing non-matching collation names by just saying that you
can turn off import_collate.  The fact is that doing so is likely
to be disastrous, the more so the more optimization intelligence
we add to postgres_fdw.

I wonder if we could do something like this:

* Change postgresImportForeignSchema() as above, so that it will never
apply "COLLATE default" to an imported column, except in the case
where you turn off import_collate.

* In postgres_fdw planning, treat "COLLATE default" on a foreign table
column as meaning "we don't know the collation"; never believe that
that column can be ordered in a way that matches any local collation.
(It'd be better perhaps if there were an explicit way to say "COLLATE
unknown", but I hesitate to invent such a concept in general.)

* Document that in manual creation of a postgres_fdw foreign table
with a text column, you need to explicitly write the correct collation
if you want the best query plans to be generated.

This seems like too big a behavioral change to consider back-patching,
unfortunately.

            regards, tom lane



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Next
From: Andrii Palko
Date:
Subject: Weird behaviour after update from 12.2 to 12.3 version