Thread: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7
Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7
From
"Daniel Westermann (DWE)"
Date:
Hi, I am not sure if this qualifies as bug, but anyway: Source instance: PostgreSQL 13.7 on RHEL 7.9 Target instance PostgreSQL 13.7 on RHEL 8.7 This is the statement: SELECT * FROM "rsu_adm"."data_2d_clb_global_product" f2 LEFT OUTER JOIN "rsu_adm"."clb_global_product" f1 on f1.cprd=f2.cprd WHERE f1.cprd is null; Per default we see a merge anti join, and this gives results, which is wrong: rsup1=# SELECT * FROM "rsu_adm"."data_2d_clb_global_product" f2 LEFT OUTER JOIN "rsu_adm"."clb_global_product" f1 on f1.cprd=f2.cprd WHERE f1.cprd is null; cprd | xtc_id | rprd | prdgalsts_id | dlz_last_transaction_ts | cprd | xtc_id | rprd | prdgalsts_id | dlz_last_tran ---------------+--------+---------+--------------+------------------------------+------+--------+------+--------------+-------------- 0027033 | | 179722 | 1 | 2023-04-03 06:15:09.45135+02 | | | | | 0112113 | | 3199208 | 1 | 2023-04-03 06:15:09.45135+02 | | | | | 0116713 | | 2071012 | 1 | 2023-04-03 06:15:09.45135+02 | | | | | 0116953 | | 2070136 | 1 | 2023-04-03 06:15:09.45135+02 | | | | | ... QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Anti Join (cost=100.84..67203.45 rows=50713 width=122) Output: f2.cprd, f2.xtc_id, f2.rprd, f2.prdgalsts_id, f2.dlz_last_transaction_ts, clb_global_product.cprd, clb_global_product.xtc_id,clb_global_product.rprd, clb_global_product.prdgalsts_id, clb_global_product.dlz_last_transaction_ts Merge Cond: ((f2.cprd)::text = (clb_global_product.cprd)::text) -> Index Scan using data_2d_clb_global_product_pkey on rsu_adm.data_2d_clb_global_product f2 (cost=0.42..2898.56 rows=101426width=34) Output: f2.cprd, f2.xtc_id, f2.rprd, f2.prdgalsts_id, f2.dlz_last_transaction_ts -> Foreign Scan on ro_dlz.clb_global_product (cost=100.42..52506.16 rows=923613 width=34) Output: clb_global_product.cprd, clb_global_product.xtc_id, clb_global_product.rprd, clb_global_product.prdgalsts_id,clb_global_product.dlz_last_transaction_ts Remote SQL: SELECT cprd, xtc_id, rprd, prdgalsts_id, dlz_last_transaction_ts FROM ro_rsu.clb_global_product ORDERBY cprd ASC NULLS LAST (8 rows) Disabling merge join gives the correct result: rsup1=# set enable_mergejoin = off; SET rsup1=# explain verbose SELECT * FROM "rsu_adm"."data_2d_clb_global_product" f2 LEFT OUTER JOIN "rsu_adm"."clb_global_product" f1 on f1.cprd=f2.cprd WHERE f1.cprd is null; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Anti Join (cost=60274.55..681118.72 rows=50713 width=122) Output: f2.cprd, f2.xtc_id, f2.rprd, f2.prdgalsts_id, f2.dlz_last_transaction_ts, clb_global_product.cprd, clb_global_product.xtc_id,clb_global_product.rprd, clb_global_product.prdgalsts_id, clb_global_product.dlz_last_transaction_ts Hash Cond: ((f2.cprd)::text = (clb_global_product.cprd)::text) -> Seq Scan on rsu_adm.data_2d_clb_global_product f2 (cost=0.00..1768.26 rows=101426 width=34) Output: f2.cprd, f2.xtc_id, f2.rprd, f2.prdgalsts_id, f2.dlz_last_transaction_ts -> Hash (cost=41513.39..41513.39 rows=923613 width=34) Output: clb_global_product.cprd, clb_global_product.xtc_id, clb_global_product.rprd, clb_global_product.prdgalsts_id,clb_global_product.dlz_last_transaction_ts -> Foreign Scan on ro_dlz.clb_global_product (cost=100.00..41513.39 rows=923613 width=34) Output: clb_global_product.cprd, clb_global_product.xtc_id, clb_global_product.rprd, clb_global_product.prdgalsts_id,clb_global_product.dlz_last_transaction_ts Remote SQL: SELECT cprd, xtc_id, rprd, prdgalsts_id, dlz_last_transaction_ts FROM ro_rsu.clb_global_product (10 rows) rsup1=# SELECT * FROM "rsu_adm"."data_2d_clb_global_product" f2 LEFT OUTER JOIN "rsu_adm"."clb_global_product" f1 on f1.cprd=f2.cprd WHERE f1.cprd is null; cprd | xtc_id | rprd | prdgalsts_id | dlz_last_transaction_ts | cprd | xtc_id | rprd | prdgalsts_id | dlz_last_transaction_ts ------+--------+------+--------------+-------------------------+------+--------+------+--------------+------------------------- (0 rows) This is the server definition: rsup1=# \des+ List of foreign servers Name | Owner | Foreign-data wrapper | Access privileges | Type | Version | FDW options | Description ---------+----------+----------------------+-------------------+------+---------+------------------------------------------------------------------------------------------------------+------------- tgt_srv | postgres | postgres_fdw | | | | (host '192.168.100.245', dbname 'dlzp1',port '5432', use_remote_estimate 'true', fetch_size '5000') | (1 row) I am aware that the version of glibc is not the same between those red hats. Is this expected? Thanks in advance Daniel
Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7
From
Tom Lane
Date:
"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes: > I am not sure if this qualifies as bug, but anyway: > Source instance: PostgreSQL 13.7 on RHEL 7.9 > Target instance PostgreSQL 13.7 on RHEL 8.7 > This is the statement: > SELECT * FROM "rsu_adm"."data_2d_clb_global_product" f2 > LEFT OUTER JOIN "rsu_adm"."clb_global_product" f1 on f1.cprd=f2.cprd > WHERE f1.cprd is null; > Per default we see a merge anti join, and this gives results, which is wrong: You didn't provide anything useful like the table schemas, but correctness of a merge join depends on the servers having the same ideas about sort ordering, and if "cprd" is a text-type column then inconsistent collations could break that. The given plan is at hazard for that because it intends to do one sort locally and the other remotely: > Merge Cond: ((f2.cprd)::text = (clb_global_product.cprd)::text) > -> Index Scan using data_2d_clb_global_product_pkey on rsu_adm.data_2d_clb_global_product f2 (cost=0.42..2898.56 rows=101426width=34) > Output: f2.cprd, f2.xtc_id, f2.rprd, f2.prdgalsts_id, f2.dlz_last_transaction_ts > -> Foreign Scan on ro_dlz.clb_global_product (cost=100.42..52506.16 rows=923613 width=34) > Output: clb_global_product.cprd, clb_global_product.xtc_id, clb_global_product.rprd, clb_global_product.prdgalsts_id,clb_global_product.dlz_last_transaction_ts > Remote SQL: SELECT cprd, xtc_id, rprd, prdgalsts_id, dlz_last_transaction_ts FROM ro_rsu.clb_global_product ORDERBY cprd ASC NULLS LAST > I am aware that the version of glibc is not the same between those red hats. Is this expected? That's certainly a hazard, but do the servers even have the same collation settings for these columns? regards, tom lane
Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7
From
"Daniel Westermann (DWE)"
Date:
>You didn't provide anything useful like the table schemas, but
>correctness of a merge join depends on the servers having the same
>deas about sort ordering, and if "cprd" is a text-type column then
>inconsistent collations could break that.
>deas about sort ordering, and if "cprd" is a text-type column then
>inconsistent collations could break that.
rsup1=# \d "rsu_adm"."data_2d_clb_global_product"
Table "rsu_adm.data_2d_clb_global_product"
Column | Type | Collation | Nullable | Default
-------------------------+--------------------------+-----------+----------+---------
cprd | character varying | | not null |
xtc_id | numeric | | |
rprd | numeric | | |
prdgalsts_id | integer | | |
dlz_last_transaction_ts | timestamp with time zone | | |
Indexes:
"data_2d_clb_global_product_pkey" PRIMARY KEY, btree (cprd)
"data_2d_clb_global_product_idx4" btree (dlz_last_transaction_ts)
Table "rsu_adm.data_2d_clb_global_product"
Column | Type | Collation | Nullable | Default
-------------------------+--------------------------+-----------+----------+---------
cprd | character varying | | not null |
xtc_id | numeric | | |
rprd | numeric | | |
prdgalsts_id | integer | | |
dlz_last_transaction_ts | timestamp with time zone | | |
Indexes:
"data_2d_clb_global_product_pkey" PRIMARY KEY, btree (cprd)
"data_2d_clb_global_product_idx4" btree (dlz_last_transaction_ts)
rsup1=# \d "rsu_adm"."clb_global_product"
View "rsu_adm.clb_global_product"
Column | Type | Collation | Nullable | Default
-------------------------+-----------------------------+-----------+----------+---------
cprd | character varying | | |
xtc_id | numeric(10,0) | | |
rprd | numeric(10,0) | | |
prdgalsts_id | numeric(10,0) | | |
dlz_last_transaction_ts | timestamp(5) with time zone | | |
Both instances use the same collation;
View "rsu_adm.clb_global_product"
Column | Type | Collation | Nullable | Default
-------------------------+-----------------------------+-----------+----------+---------
cprd | character varying | | |
xtc_id | numeric(10,0) | | |
rprd | numeric(10,0) | | |
prdgalsts_id | numeric(10,0) | | |
dlz_last_transaction_ts | timestamp(5) with time zone | | |
Both instances use the same collation;
rsup1=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
rsup1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
rsup1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
dlzp1=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
dlzp1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
dlzp1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
>The given plan is at hazard for that because it intends to do
>one sort locally and the other remotely:
Remote is a view:
dlzp1=# \d ro_rsu.clb_global_product
View "ro_rsu.clb_global_product"
Column | Type | Collation | Nullable | Default
-------------------------+-----------------------------+-----------+----------+---------
cprd | character varying | | |
xtc_id | numeric(10,0) | | |
rprd | numeric(10,0) | | |
prdgalsts_id | numeric(10,0) | | |
dlz_last_transaction_ts | timestamp(5) with time zone | | |
View "ro_rsu.clb_global_product"
Column | Type | Collation | Nullable | Default
-------------------------+-----------------------------+-----------+----------+---------
cprd | character varying | | |
xtc_id | numeric(10,0) | | |
rprd | numeric(10,0) | | |
prdgalsts_id | numeric(10,0) | | |
dlz_last_transaction_ts | timestamp(5) with time zone | | |
dlzp1=# select definition from pg_views where viewname = 'clb_global_product';
definition
----------------------------------------
SELECT clb_flbgpr.cprd, +
clb_flbgpr.xtc_id, +
clb_flbgpr.rprd, +
clb_flbgpr.prdgalsts_id, +
clb_flbgpr.dlz_last_transaction_ts+
FROM dlz_clb.clb_flbgpr;
(1 row)
dlzp1=# \d dlz_clb.clb_flbgpr
Table "dlz_clb.clb_flbgpr"
Column | Type | Collation | Nullable | Default
-------------------------+-----------------------------+-----------+----------+---------
cprd | character varying | | not null |
fp_lwr_celsius | numeric(3,0) | | |
fp_lwr_farenheit | numeric(3,0) | | |
fp_last_upd | date | | |
win_indicator | numeric(6,0) | | |
gpr_theme | character varying(6) | | |
comparison_symbol | character(1) | | |
cprd_as_previous | character varying | | |
emp_user_id_as_pcfm | character varying | | |
xtc_id | numeric(10,0) | | |
gpr_oral_care | character(1) | | |
bunit_id | numeric(10,0) | | |
remark_id | numeric(10,0) | | |
win_them_indicator | numeric(6,0) | | |
plr_id | numeric(10,0) | | |
stability_status | character(1) | | |
cmr_frm_tested | character(1) | | |
rprd | numeric(10,0) | | |
like_level_id | numeric(10,0) | | |
lhpr_id | numeric(10,0) | | |
like_perf_lvl_id | numeric(10,0) | | |
lhdncpr_id | numeric(10,0) | | |
like_hdnc_lvl_id | numeric(10,0) | | |
prdgalsts_id | numeric(10,0) | | |
oil_stock_sts | character(1) | | |
archive_status | character(2) | | |
dlz_last_transaction_ts | timestamp(5) with time zone | | |
Indexes:
"clb_flbgpr_pk" PRIMARY KEY, btree (cprd)
"clb_flbgpr_lst_trn" btree (dlz_last_transaction_ts)
Triggers:
clb_flbgpr_ins BEFORE INSERT ON dlz_clb.clb_flbgpr FOR EACH ROW EXECUTE FUNCTION dlz_adm.fct_trg_setup_transaction_ts()
clb_flbgpr_upd BEFORE UPDATE ON dlz_clb.clb_flbgpr FOR EACH ROW WHEN (old.* IS DISTINCT FROM new.*) EXECUTE FUNCTION dlz_adm.fct
_trg_setup_transaction_ts()
Regards
definition
----------------------------------------
SELECT clb_flbgpr.cprd, +
clb_flbgpr.xtc_id, +
clb_flbgpr.rprd, +
clb_flbgpr.prdgalsts_id, +
clb_flbgpr.dlz_last_transaction_ts+
FROM dlz_clb.clb_flbgpr;
(1 row)
dlzp1=# \d dlz_clb.clb_flbgpr
Table "dlz_clb.clb_flbgpr"
Column | Type | Collation | Nullable | Default
-------------------------+-----------------------------+-----------+----------+---------
cprd | character varying | | not null |
fp_lwr_celsius | numeric(3,0) | | |
fp_lwr_farenheit | numeric(3,0) | | |
fp_last_upd | date | | |
win_indicator | numeric(6,0) | | |
gpr_theme | character varying(6) | | |
comparison_symbol | character(1) | | |
cprd_as_previous | character varying | | |
emp_user_id_as_pcfm | character varying | | |
xtc_id | numeric(10,0) | | |
gpr_oral_care | character(1) | | |
bunit_id | numeric(10,0) | | |
remark_id | numeric(10,0) | | |
win_them_indicator | numeric(6,0) | | |
plr_id | numeric(10,0) | | |
stability_status | character(1) | | |
cmr_frm_tested | character(1) | | |
rprd | numeric(10,0) | | |
like_level_id | numeric(10,0) | | |
lhpr_id | numeric(10,0) | | |
like_perf_lvl_id | numeric(10,0) | | |
lhdncpr_id | numeric(10,0) | | |
like_hdnc_lvl_id | numeric(10,0) | | |
prdgalsts_id | numeric(10,0) | | |
oil_stock_sts | character(1) | | |
archive_status | character(2) | | |
dlz_last_transaction_ts | timestamp(5) with time zone | | |
Indexes:
"clb_flbgpr_pk" PRIMARY KEY, btree (cprd)
"clb_flbgpr_lst_trn" btree (dlz_last_transaction_ts)
Triggers:
clb_flbgpr_ins BEFORE INSERT ON dlz_clb.clb_flbgpr FOR EACH ROW EXECUTE FUNCTION dlz_adm.fct_trg_setup_transaction_ts()
clb_flbgpr_upd BEFORE UPDATE ON dlz_clb.clb_flbgpr FOR EACH ROW WHEN (old.* IS DISTINCT FROM new.*) EXECUTE FUNCTION dlz_adm.fct
_trg_setup_transaction_ts()
Regards
Daniel
Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7
From
Tom Lane
Date:
"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes: > Both instances use the same collation; Yeah, doesn't look like you've made any configuration mistakes. So either the two OSes sort differently, or there's index corruption causing the indexscan to give bogus output. The sample data you showed seemed to only involve numeric-ish strings, which would be highly unlikely to change sort order across locale updates. But maybe there are weirder entries elsewhere in the column? Anyway, the first thing I'd try is reindexing both tables --- doesn't look like they're large enough to make that painful. If that doesn't fix it you must have a collation difference. (Asking both systems for a sorted dump of their cprd columns could help confirm that.) You could probably hack around that, if an OS update isn't feasible, by labelling the foreign table's column with some collation you aren't using anywhere else in the local database. regards, tom lane
Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7
From
"Daniel Westermann (DWE)"
Date:
>Yeah, doesn't look like you've made any configuration mistakes.
>So either the two OSes sort differently, or there's index corruption
>causing the indexscan to give bogus output.
>causing the indexscan to give bogus output.
We did a rebuild of the indexes and also vacuum full, just to be sure. Did not change anything.
>The sample data you showed seemed to only involve numeric-ish strings,
>which would be highly unlikely to change sort order across locale
>updates. But maybe there are weirder entries elsewhere in the column?
I can probably provide a dump, but I've to ask. Would that help?
>Anyway, the first thing I'd try is reindexing both tables --- doesn't
>look like they're large enough to make that painful. If that doesn't
>fix it you must have a collation difference. (Asking both systems
>for a sorted dump of their cprd columns could help confirm that.)
>You could probably hack around that, if an OS update isn't feasible,
>by labelling the foreign table's column with some collation you aren't
>using anywhere else in the local database.
I'll try do that tomorrow.
Thanks
Daniel
Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7
From
Jim Mlodgenski
Date:
On Wed, Apr 5, 2023 at 2:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:
> I am not sure if this qualifies as bug, but anyway:
> Source instance: PostgreSQL 13.7 on RHEL 7.9
> Target instance PostgreSQL 13.7 on RHEL 8.7
CREATE TABLE t1 (c1 varchar PRIMARY KEY);
INSERT INTO t1 VALUES ('1-a'), ('1a'), ('1-aa');
SELECT * FROM t1 ORDER BY c1;
Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7
From
Tom Lane
Date:
"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes: >> The sample data you showed seemed to only involve numeric-ish strings, >> which would be highly unlikely to change sort order across locale >> updates. But maybe there are weirder entries elsewhere in the column? > I can probably provide a dump, but I've to ask. Would that help? I wasn't volunteering to check the data for you ;-), just suggesting that you might find it interesting to do so yourself. regards, tom lane
Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7
From
Joe Conway
Date:
On 4/5/23 15:46, Jim Mlodgenski wrote: > > > On Wed, Apr 5, 2023 at 2:41 PM Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > "Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com > <mailto:daniel.westermann@dbi-services.com>> writes: > > I am not sure if this qualifies as bug, but anyway: > > > Source instance: PostgreSQL 13.7 on RHEL 7.9 > > Target instance PostgreSQL 13.7 on RHEL 8.7 > > > glibc on those 2 versions of RHEL have very different ideas of what the > sort order should be. Try running the following and you'll likely see > different results on RHEL 7.9 vs 8.7 > > CREATE TABLE t1 (c1 varchar PRIMARY KEY); > INSERT INTO t1 VALUES ('1-a'), ('1a'), ('1-aa'); > SELECT * FROM t1 ORDER BY c1; Yep -- RHEL 7.9 is glibc 2.17 (likely -326) and 8.7 is glibc 2.28 (-211 seems to be latest), and they are well known to sort differently even for "common" characters (e.g. "-") -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7
From
"Daniel Westermann (DWE)"
Date:
>>
>>
>> glibc on those 2 versions of RHEL have very different ideas of what the
>> sort order should be. Try running the following and you'll likely see
>> different results on RHEL 7.9 vs 8.7
>>
>> CREATE TABLE t1 (c1 varchar PRIMARY KEY);
>> INSERT INTO t1 VALUES ('1-a'), ('1a'), ('1-aa');
>> SELECT * FROM t1 ORDER BY c1;
>Yep -- RHEL 7.9 is glibc 2.17 (likely -326) and 8.7 is glibc 2.28 (-211
>seems to be latest), and they are well known to sort differently even
>for "common" characters (e.g. "-")
>> glibc on those 2 versions of RHEL have very different ideas of what the
>> sort order should be. Try running the following and you'll likely see
>> different results on RHEL 7.9 vs 8.7
>>
>> CREATE TABLE t1 (c1 varchar PRIMARY KEY);
>> INSERT INTO t1 VALUES ('1-a'), ('1a'), ('1-aa');
>> SELECT * FROM t1 ORDER BY c1;
>Yep -- RHEL 7.9 is glibc 2.17 (likely -326) and 8.7 is glibc 2.28 (-211
>seems to be latest), and they are well known to sort differently even
>for "common" characters (e.g. "-")
Got it, thank you all for your help.
Regards
Daniel