Thread: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16583 Logged by: Jiří Fejfar Email address: jurafejfar@gmail.com PostgreSQL version: 12.4 Operating system: debian 10.5 Description: Joining two identical tables placed on separate DBs with different collation accessed through postgres_fdw failed when joined with merge join. Some records are missing (7 vs. 16 rows in example) in output. See this snippet https://gitlab.com/-/snippets/2004522 (or code pasted below) for psql script reproducing error also with expected output (working fine on alpine linux). The same behavior is also observed on postgres v13. Regards, Jiří Fejfar. --------------------------------system--------------------- debian cat /etc/debian_version 10.5 ldd --version ldd (Debian GLIBC 2.28-10) 2.28 Copyright © 2018 Free Software Foundation, Inc. -------- alpine cat /etc/alpine-release 3.12.0 ldd --version musl libc (x86_64) Version 1.1.24 Dynamic Program Loader Usage: /lib/ld-musl-x86_64.so.1 [options] [--] pathname ------------------------psql script-------------------- DROP DATABASE IF EXISTS db_en; DROP DATABASE IF EXISTS db_cz; DROP DATABASE IF EXISTS db_join; DROP USER IF EXISTS fdw_user_en; DROP USER IF EXISTS fdw_user_cz; CREATE DATABASE db_en encoding UTF8 LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8' TEMPLATE template0; CREATE DATABASE db_cz encoding UTF8 LC_COLLATE 'cs_CZ.UTF-8' LC_CTYPE 'cs_CZ.UTF-8' TEMPLATE template0; CREATE DATABASE db_join encoding UTF8 LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8' TEMPLATE template0; \c db_en CREATE TABLE t_nuts ( id INT PRIMARY KEY, label text ); WITH w_labels AS ( VALUES ('CZ0100'), ('CZ0201'), ('CZ0202'), ('CZ0203'), ('CZ0204'), ('CZ0205'), ('CZ0206'), ('CZ0207'), ('CZ0208'), ('CZ0209'), ('CZ020A'), ('CZ020B'), ('CZ020C'), ('CZ0311'), ('CZ0312'), ('CZ0313') ) INSERT INTO t_nuts (id, label) SELECT row_number() OVER() AS id, w_labels.column1 as label FROM w_labels--, generate_series(1, 500) ; VACUUM (FULL, ANALYZE) t_nuts; SELECT label, count(*) from t_nuts GROUP BY label ORDER BY label; \c db_cz CREATE TABLE t_nuts ( id INT PRIMARY KEY, label text ); WITH w_labels AS ( VALUES ('CZ0100'), ('CZ0201'), ('CZ0202'), ('CZ0203'), ('CZ0204'), ('CZ0205'), ('CZ0206'), ('CZ0207'), ('CZ0208'), ('CZ0209'), ('CZ020A'), ('CZ020B'), ('CZ020C'), ('CZ0311'), ('CZ0312'), ('CZ0313') ) INSERT INTO t_nuts (id, label) SELECT row_number() OVER() AS id, w_labels.column1 as label FROM w_labels--, generate_series(1, 1000) ; VACUUM (FULL, ANALYZE) t_nuts; SELECT label, count(*) from t_nuts GROUP BY label ORDER BY label; \c db_en CREATE USER fdw_user_en WITH PASSWORD 'fdw_pass_en'; GRANT SELECT ON TABLE t_nuts TO fdw_user_en; \c db_join CREATE EXTENSION postgres_fdw ; CREATE SERVER db_en_serv FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host 'localhost', port '5432', dbname 'db_en', use_remote_estimate 'True'); CREATE USER MAPPING FOR CURRENT_USER SERVER db_en_serv OPTIONS ( user 'fdw_user_en', password 'fdw_pass_en'); CREATE SCHEMA en; IMPORT FOREIGN SCHEMA public LIMIT TO (t_nuts) FROM SERVER db_en_serv INTO en; SELECT label, count(*) FROM en.t_nuts GROUP BY label ORDER BY label; \c db_cz CREATE USER fdw_user_cz WITH PASSWORD 'fdw_pass_cz'; GRANT SELECT ON TABLE t_nuts TO fdw_user_cz; \c db_join CREATE SERVER db_cz_serv FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host 'localhost', port '5432', dbname 'db_cz', use_remote_estimate 'True'); CREATE USER MAPPING FOR CURRENT_USER SERVER db_cz_serv OPTIONS ( user 'fdw_user_cz', password 'fdw_pass_cz'); CREATE SCHEMA cz; IMPORT FOREIGN SCHEMA public LIMIT TO (t_nuts) FROM SERVER db_cz_serv INTO cz; SELECT label, count(*) FROM cz.t_nuts GROUP BY label ORDER BY label; EXPLAIN (VERBOSE) SELECT cz__t_nuts.label, count(*) FROM cz.t_nuts AS cz__t_nuts INNER JOIN en.t_nuts AS en__t_nuts ON (cz__t_nuts.label = en__t_nuts.label) GROUP BY cz__t_nuts.label; SELECT cz__t_nuts.label, count(*) FROM cz.t_nuts AS cz__t_nuts INNER JOIN en.t_nuts AS en__t_nuts ON (cz__t_nuts.label = en__t_nuts.label) GROUP BY cz__t_nuts.label; select version(); ------------------------wrong output (Debian, GLIBC 2.28)---- DROP DATABASE DROP DATABASE DROP DATABASE DROP ROLE DROP ROLE CREATE DATABASE CREATE DATABASE CREATE DATABASE Nyní jste připojeni k databázi "db_en" jako uživatel "postgres". CREATE TABLE INSERT 0 16 VACUUM label | count --------+------- CZ0100 | 1 CZ0201 | 1 CZ0202 | 1 CZ0203 | 1 CZ0204 | 1 CZ0205 | 1 CZ0206 | 1 CZ0207 | 1 CZ0208 | 1 CZ0209 | 1 CZ020A | 1 CZ020B | 1 CZ020C | 1 CZ0311 | 1 CZ0312 | 1 CZ0313 | 1 (16 řádek) Nyní jste připojeni k databázi "db_cz" jako uživatel "postgres". CREATE TABLE INSERT 0 16 VACUUM label | count --------+------- CZ0100 | 1 CZ020A | 1 CZ020B | 1 CZ020C | 1 CZ0201 | 1 CZ0202 | 1 CZ0203 | 1 CZ0204 | 1 CZ0205 | 1 CZ0206 | 1 CZ0207 | 1 CZ0208 | 1 CZ0209 | 1 CZ0311 | 1 CZ0312 | 1 CZ0313 | 1 (16 řádek) Nyní jste připojeni k databázi "db_en" jako uživatel "postgres". CREATE ROLE GRANT Nyní jste připojeni k databázi "db_join" jako uživatel "postgres". CREATE EXTENSION CREATE SERVER CREATE USER MAPPING CREATE SCHEMA IMPORT FOREIGN SCHEMA label | count --------+------- CZ0100 | 1 CZ0201 | 1 CZ0202 | 1 CZ0203 | 1 CZ0204 | 1 CZ0205 | 1 CZ0206 | 1 CZ0207 | 1 CZ0208 | 1 CZ0209 | 1 CZ020A | 1 CZ020B | 1 CZ020C | 1 CZ0311 | 1 CZ0312 | 1 CZ0313 | 1 (16 řádek) Nyní jste připojeni k databázi "db_cz" jako uživatel "postgres". CREATE ROLE GRANT Nyní jste připojeni k databázi "db_join" jako uživatel "postgres". CREATE SERVER CREATE USER MAPPING CREATE SCHEMA IMPORT FOREIGN SCHEMA label | count --------+------- CZ0100 | 1 CZ020A | 1 CZ020B | 1 CZ020C | 1 CZ0201 | 1 CZ0202 | 1 CZ0203 | 1 CZ0204 | 1 CZ0205 | 1 CZ0206 | 1 CZ0207 | 1 CZ0208 | 1 CZ0209 | 1 CZ0311 | 1 CZ0312 | 1 CZ0313 | 1 (16 řádek) QUERY PLAN ----------------------------------------------------------------------------------------------- GroupAggregate (cost=203.28..204.16 rows=16 width=40) Output: cz__t_nuts.label, count(*) Group Key: cz__t_nuts.label -> Merge Join (cost=203.28..203.92 rows=16 width=32) Output: cz__t_nuts.label Merge Cond: (cz__t_nuts.label = en__t_nuts.label) -> Foreign Scan on cz.t_nuts cz__t_nuts (cost=101.48..101.84 rows=16 width=7) Output: cz__t_nuts.id, cz__t_nuts.label Remote SQL: SELECT label FROM public.t_nuts ORDER BY label ASC NULLS LAST -> Sort (cost=101.80..101.84 rows=16 width=7) Output: en__t_nuts.label Sort Key: en__t_nuts.label -> Foreign Scan on en.t_nuts en__t_nuts (cost=100.00..101.48 rows=16 width=7) Output: en__t_nuts.label Remote SQL: SELECT label FROM public.t_nuts (15 řádek) label | count --------+------- CZ0100 | 1 CZ020A | 1 CZ020B | 1 CZ020C | 1 CZ0311 | 1 CZ0312 | 1 CZ0313 | 1 (7 řádek) version ------------------------------------------------------------------------------------------------------------------ PostgreSQL 12.4 (Debian 12.4-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit (1 řádka) ------------------------correct output (Alpine, musl libc)---- DROP DATABASE DROP DATABASE DROP DATABASE DROP ROLE DROP ROLE CREATE DATABASE CREATE DATABASE CREATE DATABASE You are now connected to database "db_en" as user "postgres". CREATE TABLE INSERT 0 16 VACUUM label | count --------+------- CZ0100 | 1 CZ0201 | 1 CZ0202 | 1 CZ0203 | 1 CZ0204 | 1 CZ0205 | 1 CZ0206 | 1 CZ0207 | 1 CZ0208 | 1 CZ0209 | 1 CZ020A | 1 CZ020B | 1 CZ020C | 1 CZ0311 | 1 CZ0312 | 1 CZ0313 | 1 (16 rows) You are now connected to database "db_cz" as user "postgres". CREATE TABLE INSERT 0 16 VACUUM label | count --------+------- CZ0100 | 1 CZ0201 | 1 CZ0202 | 1 CZ0203 | 1 CZ0204 | 1 CZ0205 | 1 CZ0206 | 1 CZ0207 | 1 CZ0208 | 1 CZ0209 | 1 CZ020A | 1 CZ020B | 1 CZ020C | 1 CZ0311 | 1 CZ0312 | 1 CZ0313 | 1 (16 rows) You are now connected to database "db_en" as user "postgres". CREATE ROLE GRANT You are now connected to database "db_join" as user "postgres". CREATE EXTENSION CREATE SERVER CREATE USER MAPPING CREATE SCHEMA IMPORT FOREIGN SCHEMA label | count --------+------- CZ0100 | 1 CZ0201 | 1 CZ0202 | 1 CZ0203 | 1 CZ0204 | 1 CZ0205 | 1 CZ0206 | 1 CZ0207 | 1 CZ0208 | 1 CZ0209 | 1 CZ020A | 1 CZ020B | 1 CZ020C | 1 CZ0311 | 1 CZ0312 | 1 CZ0313 | 1 (16 rows) You are now connected to database "db_cz" as user "postgres". CREATE ROLE GRANT You are now connected to database "db_join" as user "postgres". CREATE SERVER CREATE USER MAPPING CREATE SCHEMA IMPORT FOREIGN SCHEMA label | count --------+------- CZ0100 | 1 CZ0201 | 1 CZ0202 | 1 CZ0203 | 1 CZ0204 | 1 CZ0205 | 1 CZ0206 | 1 CZ0207 | 1 CZ0208 | 1 CZ0209 | 1 CZ020A | 1 CZ020B | 1 CZ020C | 1 CZ0311 | 1 CZ0312 | 1 CZ0313 | 1 (16 rows) QUERY PLAN ----------------------------------------------------------------------------------------------- GroupAggregate (cost=203.28..204.16 rows=16 width=40) Output: cz__t_nuts.label, count(*) Group Key: cz__t_nuts.label -> Merge Join (cost=203.28..203.92 rows=16 width=32) Output: cz__t_nuts.label Merge Cond: (cz__t_nuts.label = en__t_nuts.label) -> Foreign Scan on cz.t_nuts cz__t_nuts (cost=101.48..101.84 rows=16 width=7) Output: cz__t_nuts.id, cz__t_nuts.label Remote SQL: SELECT label FROM public.t_nuts ORDER BY label ASC NULLS LAST -> Sort (cost=101.80..101.84 rows=16 width=7) Output: en__t_nuts.label Sort Key: en__t_nuts.label -> Foreign Scan on en.t_nuts en__t_nuts (cost=100.00..101.48 rows=16 width=7) Output: en__t_nuts.label Remote SQL: SELECT label FROM public.t_nuts (15 rows) label | count --------+------- CZ0100 | 1 CZ0201 | 1 CZ0202 | 1 CZ0203 | 1 CZ0204 | 1 CZ0205 | 1 CZ0206 | 1 CZ0207 | 1 CZ0208 | 1 CZ0209 | 1 CZ020A | 1 CZ020B | 1 CZ020C | 1 CZ0311 | 1 CZ0312 | 1 CZ0313 | 1 (16 rows) version --------------------------------------------------------------------------------------- PostgreSQL 12.4 on x86_64-pc-linux-musl, compiled by gcc (Alpine 9.3.0) 9.3.0, 64-bit (1 row)
Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > Joining two identical tables placed on separate DBs with different collation > accessed through postgres_fdw failed when joined with merge join. Some > records are missing (7 vs. 16 rows in example) in output. See this snippet > https://gitlab.com/-/snippets/2004522 (or code pasted below) for psql script > reproducing error also with expected output (working fine on alpine linux). So I think what is happening here is that postgres_fdw's version of IMPORT FOREIGN SCHEMA translates "COLLATE default" on the remote server to "COLLATE default" on the local one, which of course is a big fail if the defaults don't match. That allows the local planner to believe that remote ORDER BYs on the two foreign tables will give compatible results, causing the merge join to not work very well at all. We probably need to figure out some way of substituting the remote database's actual lc_collate setting when we see "COLLATE default". I'm also thinking that the documentation is way too cavalier about dismissing non-matching collation names by just saying that you can turn off import_collate. The fact is that doing so is likely to be disastrous, the more so the more optimization intelligence we add to postgres_fdw. I wonder if we could do something like this: * Change postgresImportForeignSchema() as above, so that it will never apply "COLLATE default" to an imported column, except in the case where you turn off import_collate. * In postgres_fdw planning, treat "COLLATE default" on a foreign table column as meaning "we don't know the collation"; never believe that that column can be ordered in a way that matches any local collation. (It'd be better perhaps if there were an explicit way to say "COLLATE unknown", but I hesitate to invent such a concept in general.) * Document that in manual creation of a postgres_fdw foreign table with a text column, you need to explicitly write the correct collation if you want the best query plans to be generated. This seems like too big a behavioral change to consider back-patching, unfortunately. regards, tom lane
Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
From
Tom Lane
Date:
I wrote: > So I think what is happening here is that postgres_fdw's version of > IMPORT FOREIGN SCHEMA translates "COLLATE default" on the remote > server to "COLLATE default" on the local one, which of course is > a big fail if the defaults don't match. That allows the local > planner to believe that remote ORDER BYs on the two foreign tables > will give compatible results, causing the merge join to not work > very well at all. > We probably need to figure out some way of substituting the remote > database's actual lc_collate setting when we see "COLLATE default". Here's a draft patch for that part. There's a few things to quibble about: * It tests for "COLLATE default" by checking whether pg_collation.oid is DEFAULT_COLLATION_OID, thus assuming that that OID will never change. I think this is safer than checking the collation name, but maybe somebody else would have a different opinion? Another idea is to check whether collprovider is 'd', but that only works with v10 and up. * It might not be able to find a remote collation matching the database's datcollate/datctype. As coded, we'll end up creating the local column with "COLLATE default", putting us back in the same hurt we're in now. I think this is okay given the other planned change to interpret "COLLATE default" as "we don't know what collation this is". In any case it's hard to see what else we could do, other than fail entirely. * Alternatively, it might find more than one such remote collation; indeed that's the norm, eg we'd typically find both "en_US" and "en_US.utf8", or the like. I made it choose the shortest collation name in such cases, but maybe there is a case for the longest? I don't much want it to pick "ucs_basic" over "C", though. * The whole thing is certain to fall over whenever we find a way to allow ICU collations as database defaults. While we can presumably fix the query when we make that change, existing postgres_fdw releases would not work against a newer server. Probably there's little to be done about this, either. * As shown by the expected-output changes, there are some test cases that expose that we're not picking the default collation anymore. That creates a testing problem: this can't be committed as-is because it'll fail with any other locale environment than what the expected file was made with. We could lobotomize the test cases to not print the column collation, but then we're not really verifying that this code does what it's supposed to. Not sure what the best compromise is. Comments? regards, tom lane diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 90db550b92..f34178c5d3 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -8232,7 +8232,7 @@ IMPORT FOREIGN SCHEMA import_source FROM SERVER loopback INTO import_dest1; Column | Type | Collation | Nullable | Default | FDW options --------+-------------------+-----------+----------+---------+-------------------- c1 | integer | | | | (column_name 'c1') - c2 | character varying | | not null | | (column_name 'c2') + c2 | character varying | C | not null | | (column_name 'c2') Server: loopback FDW options: (schema_name 'import_source', table_name 't1') @@ -8240,7 +8240,7 @@ FDW options: (schema_name 'import_source', table_name 't1') Column | Type | Collation | Nullable | Default | FDW options --------+-------------------+-----------+----------+---------+-------------------- c1 | integer | | | | (column_name 'c1') - c2 | character varying | | | | (column_name 'c2') + c2 | character varying | C | | | (column_name 'c2') c3 | text | POSIX | | | (column_name 'c3') Server: loopback FDW options: (schema_name 'import_source', table_name 't2') @@ -8264,8 +8264,8 @@ FDW options: (schema_name 'import_source', table_name 't4') Column | Type | Collation | Nullable | Default | FDW options --------+-----------------------+-----------+----------+---------+--------------------- c1 | double precision | | | | (column_name 'c1') - C 2 | text | | | | (column_name 'C 2') - c3 | character varying(42) | | | | (column_name 'c3') + C 2 | text | C | | | (column_name 'C 2') + c3 | character varying(42) | C | | | (column_name 'c3') Server: loopback FDW options: (schema_name 'import_source', table_name 'x 4') @@ -8296,7 +8296,7 @@ IMPORT FOREIGN SCHEMA import_source FROM SERVER loopback INTO import_dest2 Column | Type | Collation | Nullable | Default | FDW options --------+-------------------+-----------+----------+---------+-------------------- c1 | integer | | | | (column_name 'c1') - c2 | character varying | | not null | | (column_name 'c2') + c2 | character varying | C | not null | | (column_name 'c2') Server: loopback FDW options: (schema_name 'import_source', table_name 't1') @@ -8304,7 +8304,7 @@ FDW options: (schema_name 'import_source', table_name 't1') Column | Type | Collation | Nullable | Default | FDW options --------+-------------------+-----------+----------+---------+-------------------- c1 | integer | | | 42 | (column_name 'c1') - c2 | character varying | | | | (column_name 'c2') + c2 | character varying | C | | | (column_name 'c2') c3 | text | POSIX | | | (column_name 'c3') Server: loopback FDW options: (schema_name 'import_source', table_name 't2') @@ -8328,8 +8328,8 @@ FDW options: (schema_name 'import_source', table_name 't4') Column | Type | Collation | Nullable | Default | FDW options --------+-----------------------+-----------+----------+---------+--------------------- c1 | double precision | | | | (column_name 'c1') - C 2 | text | | | | (column_name 'C 2') - c3 | character varying(42) | | | | (column_name 'c3') + C 2 | text | C | | | (column_name 'C 2') + c3 | character varying(42) | C | | | (column_name 'c3') Server: loopback FDW options: (schema_name 'import_source', table_name 'x 4') diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 9fc53cad68..296bedd5de 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -18,6 +18,7 @@ #include "access/sysattr.h" #include "access/table.h" #include "catalog/pg_class.h" +#include "catalog/pg_collation.h" #include "commands/defrem.h" #include "commands/explain.h" #include "commands/vacuum.h" @@ -4802,44 +4803,54 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid) * include a schema name for types/functions in other schemas, which * is what we want. */ + appendStringInfoString(&buf, + "SELECT relname, " + " attname, " + " format_type(atttypid, atttypmod), " + " attnotnull, " + " pg_get_expr(adbin, adrelid), "); + if (import_collate) + appendStringInfo(&buf, + " CASE WHEN coll.oid = %u THEN" + " defcoll.collname ELSE coll.collname END, " + " CASE WHEN coll.oid = %u THEN" + " defcoll.nspname ELSE collnsp.nspname END ", + DEFAULT_COLLATION_OID, + DEFAULT_COLLATION_OID); + else + appendStringInfoString(&buf, " NULL, NULL "); + appendStringInfoString(&buf, + "FROM pg_class c " + " JOIN pg_namespace n ON " + " relnamespace = n.oid " + " LEFT JOIN pg_attribute a ON " + " attrelid = c.oid AND attnum > 0 " + " AND NOT attisdropped " + " LEFT JOIN pg_attrdef ad ON " + " adrelid = c.oid AND adnum = attnum "); if (import_collate) + { appendStringInfoString(&buf, - "SELECT relname, " - " attname, " - " format_type(atttypid, atttypmod), " - " attnotnull, " - " pg_get_expr(adbin, adrelid), " - " collname, " - " collnsp.nspname " - "FROM pg_class c " - " JOIN pg_namespace n ON " - " relnamespace = n.oid " - " LEFT JOIN pg_attribute a ON " - " attrelid = c.oid AND attnum > 0 " - " AND NOT attisdropped " - " LEFT JOIN pg_attrdef ad ON " - " adrelid = c.oid AND adnum = attnum " " LEFT JOIN pg_collation coll ON " " coll.oid = attcollation " " LEFT JOIN pg_namespace collnsp ON " - " collnsp.oid = collnamespace "); - else + " collnsp.oid = collnamespace " + " LEFT JOIN (" + " SELECT cd.collname, nd.nspname FROM" + " pg_collation cd, pg_namespace nd, pg_database d" + " WHERE nd.oid = cd.collnamespace AND" + " d.datname = current_database() AND"); + /* collprovider is new as of v10 */ + if (PQserverVersion(conn) >= 100000) + appendStringInfoString(&buf, + " cd.collprovider = 'c' AND"); appendStringInfoString(&buf, - "SELECT relname, " - " attname, " - " format_type(atttypid, atttypmod), " - " attnotnull, " - " pg_get_expr(adbin, adrelid), " - " NULL, NULL " - "FROM pg_class c " - " JOIN pg_namespace n ON " - " relnamespace = n.oid " - " LEFT JOIN pg_attribute a ON " - " attrelid = c.oid AND attnum > 0 " - " AND NOT attisdropped " - " LEFT JOIN pg_attrdef ad ON " - " adrelid = c.oid AND adnum = attnum "); - + " cd.collcollate = d.datcollate AND" + " cd.collctype = d.datctype AND" + " cd.collencoding IN (d.encoding, -1)" + " ORDER BY length(cd.collname) LIMIT 1 )" + " defcoll ON TRUE "); + } appendStringInfoString(&buf, "WHERE c.relkind IN (" CppAsString2(RELKIND_RELATION) ","
Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
From
Jiří Fejfar
Date:
On 17.08.2020 17:26, Tom Lane wrote: > PG Bug reporting form <noreply@postgresql.org> writes: >> Joining two identical tables placed on separate DBs with different collation >> accessed through postgres_fdw failed when joined with merge join. Some >> records are missing (7 vs. 16 rows in example) in output. See this snippet >> https://gitlab.com/-/snippets/2004522 (or code pasted below) for psql script >> reproducing error also with expected output (working fine on alpine linux). > So I think what is happening here is that postgres_fdw's version of > IMPORT FOREIGN SCHEMA translates "COLLATE default" on the remote > server to "COLLATE default" on the local one, which of course is > a big fail if the defaults don't match. That allows the local > planner to believe that remote ORDER BYs on the two foreign tables > will give compatible results, causing the merge join to not work > very well at all. I am just wondering: if it is bug in IMPORT FOREIGN SCHEMA, how it is possible the bug is not present [1] when provided psql script [2] is run on Alpine Linux? I suppose, both Debian and Alpine has the same IMPORT FOREIGN SCHEMA behavior (both has PG12.4). But differs in glibc vs. musl libc. Is it possible, there is also something differing in those libraries with respect to cs.CZ-UTF8? Best regards, Jiří. [1] https://gitlab.com/-/snippets/2004522#note_396751634 [2] https://gitlab.com/-/snippets/2004522
Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
From
Tom Lane
Date:
=?UTF-8?B?SmnFmcOtIEZlamZhcg==?= <jurafejfar@gmail.com> writes: > I am just wondering: if it is bug in IMPORT FOREIGN SCHEMA, how it is > possible the bug is not present [1] when provided psql script [2] is run > on Alpine Linux? [ shrug ] Could easy be that Alpine distributes dumbed-down locale definitions in which the sort order isn't actually any different between those two locales. Did you check what the sort order of your test data looks like in each case? regards, tom lane
Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
From
Jiří Fejfar
Date:
On Wed, 19 Aug 2020 at 07:53, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > =?UTF-8?B?SmnFmcOtIEZlamZhcg==?= <jurafejfar@gmail.com> writes: > > I am just wondering: if it is bug in IMPORT FOREIGN SCHEMA, how it is > > possible the bug is not present [1] when provided psql script [2] is run > > on Alpine Linux? > > [ shrug ] Could easy be that Alpine distributes dumbed-down locale > definitions in which the sort order isn't actually any different > between those two locales. Did you check what the sort order of > your test data looks like in each case? > > regards, tom lane Oh, I can see on Alpine that even local tables are ordered like with en.US-UTF8 even if DB has default cs.CZ-UTF8. postgres=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+-------------+-------------+----------------------- db_cz | postgres | UTF8 | cs_CZ.UTF-8 | cs_CZ.UTF-8 | db_en | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | ... postgres=# \c db_cz ; You are now connected to database "db_cz" as user "postgres". db_cz=# select * from t_nuts order by label; id | label ----+-------- 1 | CZ0100 2 | CZ0201 ... 11 | CZ020A 12 | CZ020B 13 | CZ020C ... It is mentioned in Alpine docker docs [1] that "Alpine-based variants do not support locales;". Thanks, J. [1] https://hub.docker.com/_/postgres