Thread: Query execution failure
I am encountering a strange error when trying to execute a complex SELECT query in Postgres. All of the details of this error are documented in the attachment.
I hope you can help me to resolve this error.
Thanks very much.
Pete Storer
Sr Data Architect
Enterprise Performance and Data Architecture - EPD
Tel: + 1 919 531 5745
SAS | 100 Campus Drive | Cary, NC 27513
DID YOU KNOW? |
Attachment
> On Jan 25, 2023, at 11:58, Pete Storer <Pete.Storer@sas.com> wrote: > > I am encountering a strange error when trying to execute a complex SELECT query in Postgres. All of the details of thiserror are documented in the attachment. Without opining on the bug itself, you will likely get a much better response if you put the details in plain text, ratherthan as a .docx attachment. Many systems strip attachments, and many more people have a (justifiable) allergy to openingattachments from an unknown source.
Resending this with all the text in the body of the email rather than as a Word attachment.
PostgreSQL Bug Report
Overview
When trying to run a complex join of eight tables – including 6 foreign tables – the query runs for a few seconds and then terminates with an error message
This occurs ONLY in the Dev environment; the query executes successfully in our Production environment, although it returns zero rows.
The query
Here is the query that is causing the problem:
SELECT
dk.deployment_id,
dk.deployment_key_txt,
dp.name as deployment_nm,
dp.tenant as deployment_tenant_nm,
infc.name as deployment_cluster_nm,
icv.description as cluster_kubernetes_version_id,
infc.nodes as cluster_node_no,
infc.ram as cluster_ram_no,
infc.cpu as cluster_cpu_no,
infc.disk as cluster_disk_no,
dp.deploy_description as infrastructure_desc,
CASE
WHEN infrq.multi_master = 1 THEN 'Y'
else 'N'
END
AS HA_multi_master_flg,
infrq.cluster_label as cluster_environment_desc,
dl.deploy_time as deployment_deploy_tm,
dl.order_number as order_no_id,
ob.order_id as order_id,
dp.server as deployment_host_nm,
to_timestamp(CAST(dl."timestamp" AS double precision)) as deployment_created_dttm,
dl.promotion_stage as promotion_stage_cd,
se.ship_event_nm as ship_event_nm,
'UDANEXT' as source_system_cd
FROM foreign_udanext.deployments dp
LEFT OUTER JOIN foreign_udanext.deployment_log dl ON (dl.deploy_id = dp.id)
LEFT OUTER JOIN foreign_udanext.infrastructure_tenant inft ON (inft.name=dp.tenant)
JOIN cqm_meta.deployment_key dk ON (concat_ws('|',dl.deploy_id, dl.task_id,'UDANEXT') = dk.deployment_key_txt)
-- JOIN sel_meta_key dk ON concat_ws('|',dl.deploy_id, dl.task_id,'UDANEXT') = dk.deployment_key_txt
LEFT OUTER JOIN foreign_udanext.infrastructure_cluster infc ON (infc.tenant_id=inft.id)
LEFT OUTER JOIN foreign_udanext.infrastructure_cluster_request infrq ON (infrq.id=infc.request_id)
LEFT OUTER JOIN foreign_udanext.infrastructure_cluster_version icv ON (infc.k8s_version_id=icv.id)
LEFT OUTER JOIN main.ship_event se ON (se.ship_event_nm = substring(dp.shipevent from 6 for 5))
inner join main.order_base ob on ob.order_no = dl.order_number
WHERE dk.deployment_key_txt NOT IN
(SELECT source_system_deployment_id
FROM main.deployment_base db)
The intent is to use this SELECT within an INSERT statement that will add the result data into a table, but first we have to have the SELECT work appropriately.
The error
ERROR: mergejoin input data is out of order
SQL state: XX000
This error message is not documented anywhere that I can find, and it only occurs in Dev, not Prod.
Suspected issue
We believe that this error is caused by one of two conditions:
- Postgres is assuming an incorrect sort order on data returned from a substring function; and/or
- The number of joins is causing intermediate data to be cached, and available cache memory is being exhausted.
Reasons for error suspicions
For (1), the error is suspected because if the line containing the substring is commented out, along with it’s associated join, the query runs successfully.
Substring:
LEFT OUTER JOIN main.ship_event se ON (se.ship_event_nm = substring(dp.shipevent from 6 for 5))
Query parameter (part of select):
se.ship_event_nm as ship_event_nm,
The reason we suspect this is first, because the query runs if this join and column selected is commented out of the query. Second, the error message, “mergejoin input data is out of order”, leads us to suspect that it is expecting the shipevent data to be in a particular order, and it is not because it is a substring that begins with the 6th character of the string in the column.
What argues against this being the issue is that fact that the query executes properly in Prod, which is running on the same version of Postgres as Dev.
For error (2), the suspicion is based on the log output. The full log will be attached to this bug report, but here’s the relevant subset of the log:
SELECT
dk.deployment_id,
dk.deployment_key_txt,
dp.name as deployment_nm,
dp.tenant as deployment_tenant_nm,
infc.name as deployment_cluster_nm,
icv.description as cluster_kubernetes_version_id,
infc.nodes as cluster_node_no,
infc.ram as cluster_ram_no,
infc.cpu as cluster_cpu_no,
infc.disk as cluster_disk_no,
dp.deploy_description as infrastructure_desc,
CASE
WHEN infrq.multi_master = 1 THEN 'Y'
else 'N'
END
AS HA_multi_master_flg,
infrq.cluster_label as cluster_environment_desc,
dl.deploy_time as deployment_deploy_tm,
dl.order_number as order_no_id,
ob.order_id as order_id,
dp.server as deployment_host_nm,
to_timestamp(CAST(dl."timestamp" AS double precision)) as deployment_created_dttm,
dl.promotion_stage as promotion_stage_cd,
se.ship_event_nm as ship_event_nm,
'UDANEXT' as source_system_cd
FROM foreign_udanext.deployments dp
LEFT OUTER JOIN foreign_udanext.deployment_log dl ON (dl.deploy_id = dp.id)
LEFT OUTER JOIN foreign_udanext.infrastructure_tenant inft ON (inft.name=dp.tenant)
JOIN cqm_meta.deployment_key dk ON (concat_ws('|',dl.deploy_id, dl.task_id,'UDANEXT') = dk.deployment_key_txt)
-- JOIN sel_meta_key dk ON concat_ws('|',dl.deploy_id, dl.task_id,'UDANEXT') = dk.deployment_key_txt
LEFT OUTER JOIN foreign_udanext.infrastructure_cluster infc ON (infc.tenant_id=inft.id)
LEFT OUTER JOIN foreign_udanext.infrastructure_cluster_request infrq ON (infrq.id=infc.request_id)
LEFT OUTER JOIN foreign_udanext.infrastructure_cluster_version icv ON (infc.k8s_version_id=icv.id)
LEFT OUTER JOIN main.ship_event se ON (se.ship_event_nm = substring(dp.shipevent from 6 for 5))
inner join main.order_base ob on ob.order_no = dl.order_number
WHERE dk.deployment_key_txt NOT IN
(SELECT source_system_deployment_id
FROM main.deployment_base db)
2023-01-25 09:08:46 EST [354873]: user=pestor,db=cqm_dev,app=pgAdmin 4 - CONN:5168871,client=172.16.47.57 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp354873.0", size 24272896
2023-01-25 09:08:46 EST [354873]: user=pestor,db=cqm_dev,app=pgAdmin 4 - CONN:5168871,client=172.16.47.57 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp354873.5", size 0
2023-01-25 09:08:46 EST [354873]: user=pestor,db=cqm_dev,app=pgAdmin 4 - CONN:5168871,client=172.16.47.57 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp354873.4", size 0
2023-01-25 09:08:46 EST [354873]: user=pestor,db=cqm_dev,app=pgAdmin 4 - CONN:5168871,client=172.16.47.57 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp354873.3", size 0
2023-01-25 09:08:46 EST [354873]: user=pestor,db=cqm_dev,app=pgAdmin 4 - CONN:5168871,client=172.16.47.57 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp354873.2", size 24231936
2023-01-25 09:08:46 EST [354873]: user=pestor,db=cqm_dev,app=pgAdmin 4 - CONN:5168871,client=172.16.47.57 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp354873.1", size 24305664
2023-01-25 09:08:55 EST [354911]: user=,db=,app=,client= LOG: automatic vacuum of table "cqm_metabase.public.qrtz_scheduler_state": index scans: 0
pages: 0 removed, 1 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 56 removed, 1 remain, 0 are dead but not yet removable, oldest xmin: 53130667
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 18.382 MB/s
buffer usage: 52 hits, 0 misses, 1 dirtied
WAL usage: 3 records, 1 full page images, 8671 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
What is significant here is that a number of temp files are created with intermediate results. Three temp files of none-zero size are noted, each of which is slightly over 24 MB in size. It could be that, as a result of all this caching, we are running out of cache/swap space on the server which is causing the failure (albeit with an EXTREMELY misleading error message). As you can see below, the two environments are very similar in architecture except for number of CPUs.
Below are the stats for the two machines, dev and prod.
Dev environment
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 32810860 0 32810860 0% /dev
tmpfs 32829716 1308 32828408 1% /dev/shm
tmpfs 32829716 1016 32828700 1% /run
tmpfs 32829716 0 32829716 0% /sys/fs/cgroup
/dev/mapper/vg_default-lv_root 69094696 18072504 47830792 28% /
/dev/sda1 499656 347104 115856 75% /boot
/dev/mapper/vg_default-lv_var 16382844 1684248 13843352 11% /var
/dev/mapper/vg_default-lv_tmp 5095040 23452 4793060 1% /tmp
/dev/mapper/vg_data-lv_data 2604006408 1295117116 1308872908 50% /var/lib/pgsql
regxbc02.unx.sas.com:/opt/sysadm 206288896 91794432 103992320 47% /opt/sysadm
isilon03nfs.unx.sas.com:/ifs/nosnaps/cqm_backups 1468006400 1182180864 285825536 81% /var/cqm_backups
isilon03.unx.sas.com:/ifs/data/cqmtest_backups 1073741824 1073741824 0 100% /var/cqmtest_backups
tmpfs 6565940 0 6565940 0% /run/user/3595
tmpfs 6565940 0 6565940 0% /run/user/10639
cat /proc/meminfo
MemTotal: 65659436 kB
MemFree: 17569556 kB
MemAvailable: 47897876 kB
Buffers: 248268 kB
Cached: 45506792 kB
SwapCached: 14000 kB
Active: 28747392 kB
Inactive: 17297648 kB
Active(anon): 16033288 kB
Inactive(anon): 266548 kB
Active(file): 12714104 kB
Inactive(file): 17031100 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 4194300 kB
SwapFree: 3939324 kB
Dirty: 52 kB
Writeback: 0 kB
AnonPages: 280560 kB
Mapped: 16046604 kB
Shmem: 16009848 kB
KReclaimable: 1309076 kB
Slab: 1543072 kB
SReclaimable: 1309076 kB
SUnreclaim: 233996 kB
KernelStack: 6432 kB
PageTables: 225972 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 37024016 kB
Committed_AS: 17046424 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
Percpu: 107008 kB
HardwareCorrupted: 0 kB
AnonHugePages: 159744 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 542528 kB
DirectMap2M: 50837504 kB
DirectMap1G: 17825792 kB
$ lscpu | egrep 'Model name|Socket|Thread|NUMA|CPU\(s\)'
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Socket(s): 4
NUMA node(s): 1
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
NUMA node0 CPU(s): 0-3
select version()
PostgreSQL 14.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
Prod environment
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 65840984 0 65840984 0% /dev
tmpfs 65859840 58564 65801276 1% /dev/shm
tmpfs 65859840 1132 65858708 1% /run
tmpfs 65859840 0 65859840 0% /sys/fs/cgroup
/dev/mapper/vg_default-lv_root 69094696 4753588 61149708 8% /
/dev/sda1 499656 347064 115896 75% /boot
/dev/mapper/vg_default-lv_var 16382844 3419088 12108512 23% /var
/dev/mapper/vg_default-lv_tmp 5095040 20492 4796020 1% /tmp
regxbc02.unx.sas.com:/opt/sysadm 206288896 91794432 103992320 47% /opt/sysadm
tmpfs 13171968 0 13171968 0% /run/user/0
isilon03nfs.unx.sas.com:/ifs/nosnaps/cqm_backups 1468006400 1182180864 285825536 81% /var/cqm_backups
/dev/drbd0 1878122868 901901912 976204572 49% /var/lib/pgsql
tmpfs 13171968 0 13171968 0% /run/user/3595
tmpfs 13171968 0 13171968 0% /run/user/10639
$ cat /proc/meminfo
MemTotal: 131719684 kB
MemFree: 1893884 kB
MemAvailable: 96026960 kB
Buffers: 55028 kB
Cached: 125785868 kB
SwapCached: 20956 kB
Active: 62095452 kB
Inactive: 64324096 kB
Active(anon): 32042800 kB
Inactive(anon): 565988 kB
Active(file): 30052652 kB
Inactive(file): 63758108 kB
Unevictable: 165980 kB
Mlocked: 165980 kB
SwapTotal: 4194300 kB
SwapFree: 3481264 kB
Dirty: 2056 kB
Writeback: 0 kB
AnonPages: 734624 kB
Mapped: 31501228 kB
Shmem: 32014476 kB
KReclaimable: 1574668 kB
Slab: 1860200 kB
SReclaimable: 1574668 kB
SUnreclaim: 285532 kB
KernelStack: 9168 kB
PageTables: 738928 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 70054140 kB
Committed_AS: 34301304 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
Percpu: 158208 kB
HardwareCorrupted: 0 kB
AnonHugePages: 376832 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 2967360 kB
DirectMap2M: 92452864 kB
DirectMap1G: 40894464 kB
$ lscpu | egrep 'Model name|Socket|Thread|NUMA|CPU\(s\)'
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Socket(s): 16
NUMA node(s): 1
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
NUMA node0 CPU(s): 0-15
Select version()
PostgreSQL 14.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
From: Pete Storer
Sent: Wednesday, January 25, 2023 2:59 PM
To: 'pgsql-bugs@lists.postgresql.org' <pgsql-bugs@lists.postgresql.org>
Subject: Query execution failure
I am encountering a strange error when trying to execute a complex SELECT query in Postgres. All of the details of this error are documented in the attachment.
I hope you can help me to resolve this error.
Thanks very much.
Pete Storer
Sr Data Architect
Enterprise Performance and Data Architecture - EPD
Tel: + 1 919 531 5745
SAS | 100 Campus Drive | Cary, NC 27513
DID YOU KNOW? |
Attachment
Pete Storer <Pete.Storer@sas.com> writes: > When trying to run a complex join of eight tables - including 6 foreign tables - the query runs for a few seconds and thenterminates with an error message > ERROR: mergejoin input data is out of order You have not shown us the table definitions, nor an EXPLAIN for the query, but this'd indicate that the sort ordering of one or more tables is not like that of the others. Likely causes of this include * corrupt index, if an indexscan is being used instead of an explicit sort * foreign server has a different idea of the locale's sort details than the local server does (or than some other foreign server, if the query touches more than one) If the join key columns are all numeric then it's almost surely a corrupt index somewhere. But if you are joining on string keys then you are subject to the whims of the locale's sort order, and that is a mess, particularly for non-ASCII data. Not only might it be different on different platforms, but even on the same platform it can change over time (with the effect of making indexes on string columns corrupt). There's some details and advice here: https://wiki.postgresql.org/wiki/Locale_data_changes regards, tom lane
Thanks for the quick response, Tom. To clarify, the DDL for the two local tables and the primary foreign table are below. I am also including the EXPLAIN VERBOSEoutput for this query. I'm not sure how to identify which index may be corrupt, if that is the cause. Regarding a different sort order between tables,if that were the case, wouldn't this query fail in Production as it does in our Development environment? They areidentical with respect to indices and sort order, and they both point to the same foreign table, which resides on a MySQLdatabase. Let me know if this additional information helps. Thanks. __________________________________________ CREATE TABLE IF NOT EXISTS main.ship_event ( ship_event_nm character varying(20) COLLATE pg_catalog."default" NOT NULL, available_for_ship_dt date, last_build_dt date, sign_off_dt date, commit_to_ship_dt date, l10n_file_freeze_dt date, dev_test_infra_ready_dt date, builds_stop_dt date, localization_ready_dt date, source_system_cd character varying(20) COLLATE pg_catalog."default" NOT NULL, updated_dttm timestamp with time zone NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_by_id character varying(50) COLLATE pg_catalog."default" NOT NULL DEFAULT CURRENT_USER, CONSTRAINT ship_event_pk PRIMARY KEY (ship_event_nm), CONSTRAINT ship_event_fk01 FOREIGN KEY (source_system_cd) REFERENCES cqm_meta.source_system (source_system_cd) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) TABLESPACE pg_default; ALTER TABLE IF EXISTS main.ship_event OWNER to cqmadmin; CREATE TABLE IF NOT EXISTS main.deployment_base ( deployment_id integer NOT NULL DEFAULT nextval('main.deployment_base_deployment_id_seq'::regclass), source_system_deployment_id text COLLATE pg_catalog."default" NOT NULL, deployment_nm character varying(100) COLLATE pg_catalog."default", deployment_tenant_nm character varying(100) COLLATE pg_catalog."default", deployment_cluster_nm character varying(100) COLLATE pg_catalog."default", cluster_kubernetes_version_id character varying(100) COLLATE pg_catalog."default", cluster_node_no smallint, cluster_ram_no smallint, cluster_cpu_no smallint, cluster_disk_no smallint, infrastructure_desc text COLLATE pg_catalog."default", ha_multi_master_flg character(1) COLLATE pg_catalog."default", cluster_environment_desc character varying(100) COLLATE pg_catalog."default", build_level_cd character varying(50) COLLATE pg_catalog."default", deployment_deploy_tm integer, deployment_ulid_cd character varying(120) COLLATE pg_catalog."default", order_id integer, k8s_namespace_nm character varying(100) COLLATE pg_catalog."default", k8s_kubeconfig_url_txt character varying(200) COLLATE pg_catalog."default", k8s_kubeconfig_data jsonb, k8s_failed_pods_cnt integer, k8s_client_version_no character varying(20) COLLATE pg_catalog."default", k8s_server_version_no character varying(20) COLLATE pg_catalog."default", k8s_cluster_type_cd character varying(20) COLLATE pg_catalog."default", k8s_ingress_controller_type_cd character varying(20) COLLATE pg_catalog."default", k8s_os_type_nm character varying(20) COLLATE pg_catalog."default", k8s_os_version_no character varying(20) COLLATE pg_catalog."default", deployment_host_nm character varying(120) COLLATE pg_catalog."default", promotion_stage_cd character varying(120) COLLATE pg_catalog."default", ship_event_nm character varying(20) COLLATE pg_catalog."default", deployment_uuid_cd character varying(120) COLLATE pg_catalog."default", manifest_commit_id character varying(120) COLLATE pg_catalog."default", orderable_ary_nm character varying[] COLLATE pg_catalog."default", deployment_deploy_dttm timestamp with time zone, source_system_cd character varying(20) COLLATE pg_catalog."default", cruise_worker_id character varying(50) COLLATE pg_catalog."default", cluster_info_json_doc jsonb, cas_ready_cd character varying(10) COLLATE pg_catalog."default", all_pods_alive_flg boolean, oauth_ready_flg boolean, deployment_label character varying(50) COLLATE pg_catalog."default", sas_conn_ready_flg boolean, sas_login_ready_flg boolean, postgres_version_no character varying(20) COLLATE pg_catalog."default", postgres_type_cd character varying(10) COLLATE pg_catalog."default", cadence_id integer, deployment_test_flg boolean, deployment_type_nm character varying(20) COLLATE pg_catalog."default", cloud_provider_nm character varying(50) COLLATE pg_catalog."default", storage_type_nm character varying(50) COLLATE pg_catalog."default", sas_base_url_txt character varying(200) COLLATE pg_catalog."default", embedded_ldap_flg boolean, cas_host_nm character varying(100) COLLATE pg_catalog."default", cas_mode_cd character varying(10) COLLATE pg_catalog."default", cas_port_no character varying(10) COLLATE pg_catalog."default", cas_battery_results_cd character varying(10) COLLATE pg_catalog."default", cas_server_type_cd character varying(10) COLLATE pg_catalog."default", cas_backup_flg boolean, node_instance_type character varying(100) COLLATE pg_catalog."default", viya4_ark_json_doc jsonb, performance_json_doc jsonb, astrolabe_json_doc jsonb, viya4_env_valid_json_doc jsonb, updated_dttm timestamp with time zone NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_by_id character varying(50) COLLATE pg_catalog."default" NOT NULL DEFAULT CURRENT_USER, CONSTRAINT deployment_base_pk PRIMARY KEY (deployment_id), CONSTRAINT deployment_base_ux01 UNIQUE (source_system_deployment_id, source_system_cd), CONSTRAINT deployment_base_fk01 FOREIGN KEY (ship_event_nm) REFERENCES main.ship_event (ship_event_nm) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT deployment_base_fk02 FOREIGN KEY (source_system_cd) REFERENCES cqm_meta.source_system (source_system_cd) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT deployment_base_fk03 FOREIGN KEY (order_id) REFERENCES main.order_base (order_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) TABLESPACE pg_default; Query explain QUERY PLAN Hash Join (cost=182208.88..4862218.44 rows=10656250 width=1555) Output: dk.deployment_id, dk.deployment_key_txt, dp.name, dp.tenant, infc.name, icv.description, infc.nodes, infc.ram,infc.cpu, infc.disk, dp.deploy_description, CASE WHEN (infrq.multi_master = 1) THEN 'Y'::text ELSE 'N'::text END,infrq.cluster_label, dl.deploy_time, dl.order_number, ob.order_id, dp.server, to_timestamp((dl."timestamp")::double precision),dl.promotion_stage, se.ship_event_nm, 'UDANEXT'::text Hash Cond: ((dl.order_number)::text = (ob.order_no)::text) -> Merge Left Join (cost=14682.01..174631.01 rows=10656250 width=1599) Output: dp.name, dp.tenant, dp.deploy_description, dp.server, dl.deploy_time, dl.order_number, dl."timestamp", dl.promotion_stage,dk.deployment_id, dk.deployment_key_txt, infc.name, infc.nodes, infc.ram, infc.cpu, infc.disk, infrq.multi_master,infrq.cluster_label, icv.description, se.ship_event_nm Merge Cond: ((dp.tenant)::text = (inft.name)::text) -> Sort (cost=14667.01..14709.63 rows=17050 width=927) Output: dp.name, dp.tenant, dp.deploy_description, dp.server, dl.deploy_time, dl.order_number, dl."timestamp",dl.promotion_stage, dk.deployment_id, dk.deployment_key_txt, se.ship_event_nm Sort Key: dp.tenant -> Merge Join (cost=13165.12..13468.61 rows=17050 width=927) Output: dp.name, dp.tenant, dp.deploy_description, dp.server, dl.deploy_time, dl.order_number, dl."timestamp",dl.promotion_stage, dk.deployment_id, dk.deployment_key_txt, se.ship_event_nm Merge Cond: (dk.deployment_key_txt = (concat_ws('|'::text, dl.deploy_id, dl.task_id, 'UDANEXT'))) -> Sort (cost=12778.41..12780.12 rows=682 width=36) Output: dk.deployment_id, dk.deployment_key_txt Sort Key: dk.deployment_key_txt -> Foreign Scan on cqm_meta.deployment_key dk (cost=12691.97..12746.31 rows=682 width=36) Output: dk.deployment_id, dk.deployment_key_txt Filter: (NOT (hashed SubPlan 1)) Remote SQL: SELECT deployment_id, deployment_key_txt FROM cqm_meta.deployment_key SubPlan 1 -> Index Only Scan using deployment_base_ux01 on main.deployment_base db (cost=0.42..11840.60rows=300546 width=20) Output: db.source_system_deployment_id -> Sort (cost=386.70..399.20 rows=5000 width=899) Output: dp.name, dp.tenant, dp.deploy_description, dp.server, dl.deploy_time, dl.order_number,dl."timestamp", dl.promotion_stage, dl.deploy_id, dl.task_id, se.ship_event_nm, (concat_ws('|'::text, dl.deploy_id,dl.task_id, 'UDANEXT')) Sort Key: (concat_ws('|'::text, dl.deploy_id, dl.task_id, 'UDANEXT')) -> Hash Left Join (cost=45.69..79.51 rows=5000 width=899) Output: dp.name, dp.tenant, dp.deploy_description, dp.server, dl.deploy_time, dl.order_number,dl."timestamp", dl.promotion_stage, dl.deploy_id, dl.task_id, se.ship_event_nm, concat_ws('|'::text, dl.deploy_id,dl.task_id, 'UDANEXT') Inner Unique: true Hash Cond: (SUBSTRING(dp.shipevent FROM 6 FOR 5) = (se.ship_event_nm)::text) -> Foreign Scan (cost=15.00..35.00 rows=5000 width=960) Output: dp.name, dp.tenant, dp.deploy_description, dp.server, dp.shipevent, dl.deploy_time,dl.order_number, dl."timestamp", dl.promotion_stage, dl.deploy_id, dl.task_id Relations: (udanext.deployments dp) INNER JOIN (udanext.deployment_log dl) Remote server startup cost: 25 Remote query: SELECT r1.`name`, r1.`tenant`, r1.`deploy_description`, r1.`server`,r1.`shipevent`, r2.`deploy_time`, r2.`order_number`, r2.`timestamp`, r2.`promotion_stage`, r2.`deploy_id`, r2.`task_id`FROM (`udanext`.`deployments` r1 INNER JOIN `udanext`.`deployment_log` r2 ON (((r1.`id` = r2.`deploy_id`))))ORDER BY r2.`order_number` IS NULL, r2.`order_number` ASC -> Hash (cost=19.75..19.75 rows=875 width=7) Output: se.ship_event_nm -> Seq Scan on main.ship_event se (cost=0.00..19.75 rows=875 width=7) Output: se.ship_event_nm -> Materialize (cost=15.00..347.50 rows=125000 width=890) Output: inft.name, infc.name, infc.nodes, infc.ram, infc.cpu, infc.disk, infrq.multi_master, infrq.cluster_label,icv.description -> Foreign Scan (cost=15.00..35.00 rows=125000 width=890) Output: inft.name, infc.name, infc.nodes, infc.ram, infc.cpu, infc.disk, infrq.multi_master, infrq.cluster_label,icv.description Relations: (((udanext.infrastructure_tenant inft) LEFT JOIN (udanext.infrastructure_cluster infc)) LEFTJOIN (udanext.infrastructure_cluster_request infrq)) LEFT JOIN (udanext.infrastructure_cluster_version icv) Remote server startup cost: 25 Remote query: SELECT r4.`name`, r8.`name`, r8.`nodes`, r8.`ram`, r8.`cpu`, r8.`disk`, r10.`multi_master`,r10.`cluster_label`, r12.`description` FROM (((`udanext`.`infrastructure_tenant` r4 LEFT JOIN `udanext`.`infrastructure_cluster`r8 ON (((r8.`tenant_id` = r4.`id`)))) LEFT JOIN `udanext`.`infrastructure_cluster_request`r10 ON (((r10.`id` = r8.`request_id`)))) LEFT JOIN `udanext`.`infrastructure_cluster_version`r12 ON (((r8.`k8s_version_id` = r12.`id`)))) ORDER BY r4.`name` IS NULL, r4.`name`ASC -> Hash (cost=113073.72..113073.72 rows=3132572 width=11) Output: ob.order_id, ob.order_no -> Seq Scan on main.order_base ob (cost=0.00..113073.72 rows=3132572 width=11) Output: ob.order_id, ob.order_no Query Identifier: -8672584825060546873 CREATE FOREIGN TABLE IF NOT EXISTS foreign_udanext.deployments( id integer NOT NULL, "timestamp" character varying(50) NULL COLLATE pg_catalog."default", status character varying(15) NULL COLLATE pg_catalog."default", prod_code character varying(10) NULL COLLATE pg_catalog."default", shipevent character varying(25) NULL COLLATE pg_catalog."default", user_id character varying(15) NULL COLLATE pg_catalog."default", name character varying(100) NULL COLLATE pg_catalog."default", env_type character varying(25) NULL COLLATE pg_catalog."default", tenant character varying(100) NULL COLLATE pg_catalog."default", deploy_type character varying(25) NULL COLLATE pg_catalog."default", host_code character varying(5) NULL COLLATE pg_catalog."default", cas_controller character varying(50) NULL COLLATE pg_catalog."default", contacts character varying(400) NULL COLLATE pg_catalog."default", deploy_description text NULL COLLATE pg_catalog."default", server character varying(100) NULL COLLATE pg_catalog."default", vm_instance_name character varying(50) NULL COLLATE pg_catalog."default", vm_instance_size character varying(25) NULL COLLATE pg_catalog."default", race_base_image character varying(25) NULL COLLATE pg_catalog."default", race_gold_image character varying(25) NULL COLLATE pg_catalog."default", race_label character varying(50) NULL COLLATE pg_catalog."default", load_image smallint NULL, vm_instance_os character varying(200) NULL COLLATE pg_catalog."default", topology character varying(25) NULL COLLATE pg_catalog."default", tenant_region character varying(25) NULL COLLATE pg_catalog."default", vm_volume_size integer NULL, multi_nic_enabled smallint NULL, service_mesh_type integer NOT NULL, orchestration_format_id integer NULL, database_type integer NOT NULL, cas_id integer NULL ) SERVER udanext_prod_server OPTIONS (dbname 'udanext', table_name 'deployments'); -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Thursday, January 26, 2023 10:24 AM To: Pete Storer <Pete.Storer@sas.com> Cc: pgsql-bugs@lists.postgresql.org Subject: Re: FW: Query execution failure [You don't often get email from tgl@sss.pgh.pa.us. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification] EXTERNAL Pete Storer <Pete.Storer@sas.com> writes: > When trying to run a complex join of eight tables - including 6 > foreign tables - the query runs for a few seconds and then terminates > with an error message > ERROR: mergejoin input data is out of order You have not shown us the table definitions, nor an EXPLAIN for the query, but this'd indicate that the sort ordering ofone or more tables is not like that of the others. Likely causes of this include * corrupt index, if an indexscan is being used instead of an explicit sort * foreign server has a different idea of the locale's sort details than the local server does (or than some other foreign server, if the query touches more than one) If the join key columns are all numeric then it's almost surely a corrupt index somewhere. But if you are joining on stringkeys then you are subject to the whims of the locale's sort order, and that is a mess, particularly for non-ASCII data. Not only might it be different on different platforms, but even on the same platform it can change over time (withthe effect of making indexes on string columns corrupt). There's some details and advice here: https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.postgresql.org%2Fwiki%2FLocale_data_changes&data=05%7C01%7CPete.Storer%40sas.com%7C69ba399db52343407f0c08daffb15836%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C638103434451258989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7nBPCQY16PdlQy1tmPnURFk3LMSL%2FosNsrZhM0eEwvo%3D&reserved=0 regards, tom lane
Pete Storer <Pete.Storer@sas.com> writes: > I'm not sure how to identify which index may be corrupt, if that is the cause. Regarding a different sort order betweentables, if that were the case, wouldn't this query fail in Production as it does in our Development environment? Theyare identical with respect to indices and sort order, and they both point to the same foreign table, which resides ona MySQL database. Meh. You've just got two merge joins in that plan, and three of the four inputs to them are being explicitly sorted, so one would certainly hope that those sorts are consistent. But the fourth input is being sourced directly from the foreign server, and so this query is fundamentally assuming that the foreign server has the same idea of sort ordering as the local one. Evidently that assumption is wrong. I'd check locale settings on your two databases to understand why it doesn't fail on prod. But the long-term solution if you want this to work reliably is probably to force use of C locale on all three databases; that's about the only locale that you can fully trust to be portable. I don't know anything about the mysql FDW, but another possible route to fixing things is to get it to not believe that the remote's sort ordering matches the local one. If the plan were relying on a local sort instead of a remote sort then everything would be fine. regards, tom lane
On Thu, Jan 26, 2023 at 11:06:41AM -0500, Tom Lane wrote: > I don't know anything about the mysql FDW, but another possible route > to fixing things is to get it to not believe that the remote's sort > ordering matches the local one. If the plan were relying on a local > sort instead of a remote sort then everything would be fine. When we implemented sort pushdown to FDWs I had not considered how hard it might be to match sort ordering, especially considering the problem we have in matching orderings from different operating system versions. Is this documented somewhere? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
On 1/30/23 10:34, Bruce Momjian wrote: > On Thu, Jan 26, 2023 at 11:06:41AM -0500, Tom Lane wrote: >> I don't know anything about the mysql FDW, but another possible route >> to fixing things is to get it to not believe that the remote's sort >> ordering matches the local one. If the plan were relying on a local >> sort instead of a remote sort then everything would be fine. > > When we implemented sort pushdown to FDWs I had not considered how hard > it might be to match sort ordering, especially considering the problem > we have in matching orderings from different operating system versions. Different sort ordering could be due to all kinds of things, including for example a RHEL 8 instance pulling from a RHEL 7 one (glibc mismatch). Also I have been told (not verified by me) that MySQL at least supports using its own built-in collation. If so, how can we ever know that whatever Postgres is using will match that? I wonder if we should have an attribute of a foreign server that indicates whether sort pushdown is enabled or not. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Joe Conway <mail@joeconway.com> writes: > Also I have been told (not verified by me) that MySQL at least supports > using its own built-in collation. If so, how can we ever know that > whatever Postgres is using will match that? Ugh. mysql_fdw probably ought to default to no-sort-pushdown. > I wonder if we should have an attribute of a foreign server that > indicates whether sort pushdown is enabled or not. This is something that would have to be implemented individually by each FDW for which it's relevant. We could set a good precedent, and perhaps standardize the option name, by doing that in postgres_fdw. regards, tom lane
This is very interesting, guys. And it leads me to a question: Is there any way to FORCE Postgres to do a local sort/collationof any data that is accessed through a FDW? -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Monday, January 30, 2023 11:30 AM To: Joe Conway <mail@joeconway.com> Cc: Bruce Momjian <bruce@momjian.us>; Pete Storer <Pete.Storer@sas.com>; pgsql-bugs@lists.postgresql.org Subject: Re: FW: Query execution failure EXTERNAL Joe Conway <mail@joeconway.com> writes: > Also I have been told (not verified by me) that MySQL at least > supports using its own built-in collation. If so, how can we ever know > that whatever Postgres is using will match that? Ugh. mysql_fdw probably ought to default to no-sort-pushdown. > I wonder if we should have an attribute of a foreign server that > indicates whether sort pushdown is enabled or not. This is something that would have to be implemented individually by each FDW for which it's relevant. We could set a goodprecedent, and perhaps standardize the option name, by doing that in postgres_fdw. regards, tom lane
> On Jan 30, 2023, at 10:17, Pete Storer <Pete.Storer@sas.com> wrote: > > This is very interesting, guys. And it leads me to a question: Is there any way to FORCE Postgres to do a local sort/collationof any data that is accessed through a FDW? It depends on the FDW. One thing to try is to have the SORT be an expression, especially one that depends on functions thatcan't otherwise be pushed down. (You may have to do some experimenting to find one that returns identity, i.e. the samesort order as just the bare column, but isn't pushed down.)
Makes sense - but I'm using the PG substring function here. LEFT OUTER JOIN main.ship_event se ON (se.ship_event_nm = substring(dp.shipevent from 6 for 5)) In this case, the dp.shipevent is in the fdw-accessed MySql table. Shouldn't that force the sort to be local? -----Original Message----- From: Christophe Pettus <xof@thebuild.com> Sent: Monday, January 30, 2023 1:20 PM To: Pete Storer <Pete.Storer@sas.com> Cc: Tom Lane <tgl@sss.pgh.pa.us>; Joe Conway <mail@joeconway.com>; Bruce Momjian <bruce@momjian.us>; pgsql-bugs@lists.postgresql.org Subject: Re: Query execution failure [You don't often get email from xof@thebuild.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification] EXTERNAL > On Jan 30, 2023, at 10:17, Pete Storer <Pete.Storer@sas.com> wrote: > > This is very interesting, guys. And it leads me to a question: Is there any way to FORCE Postgres to do a local sort/collationof any data that is accessed through a FDW? It depends on the FDW. One thing to try is to have the SORT be an expression, especially one that depends on functions thatcan't otherwise be pushed down. (You may have to do some experimenting to find one that returns identity, i.e. the samesort order as just the bare column, but isn't pushed down.)
On 1/30/23 13:24, Pete Storer wrote: > Makes sense - but I'm using the PG substring function here. > > LEFT OUTER JOIN main.ship_event se ON (se.ship_event_nm = substring(dp.shipevent from 6 for 5)) > > In this case, the dp.shipevent is in the fdw-accessed MySql table. Shouldn't that force the sort to be local? Looking at the mysql-fdw source (which looks pretty much the same as postgres-fdw at first glance): 8<------------- /* We don't support cases where there are any SRFs in the targetlist */ if (parse->hasTargetSRFs) return; 8<------------- Based on that, perhaps a kluge (and completely untested) workaround is to: 1/ create an SRF that returns one column, one row constant 2/ add the SRF to your targetlist The SRF likely needs to be plpgsql to avoid inlining, maybe something like: 8<------------- CREATE OR REPLACE FUNCTION theanswer() RETURNS setof int as $$ BEGIN RETURN NEXT 42; END; $$ LANGUAGE plpgsql; 8<------------- HTH, -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Jan 30, 2023 at 10:34:20AM -0500, Bruce Momjian wrote: > On Thu, Jan 26, 2023 at 11:06:41AM -0500, Tom Lane wrote: > > I don't know anything about the mysql FDW, but another possible route > > to fixing things is to get it to not believe that the remote's sort > > ordering matches the local one. If the plan were relying on a local > > sort instead of a remote sort then everything would be fine. > > When we implemented sort pushdown to FDWs I had not considered how hard > it might be to match sort ordering, especially considering the problem > we have in matching orderings from different operating system versions. > > Is this documented somewhere? The attached patch documents this remote sort order requirement. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Attachment
Bruce Momjian <bruce@momjian.us> writes: > The attached patch documents this remote sort order requirement. That seems like quite a random place to insert the documentation. I doubt that the requirement applies only to upper paths -- FDWs are allowed to return sorted paths for base relations too, comparable to indexscan paths. regards, tom lane
On Wed, Sep 27, 2023 at 09:06:13PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > The attached patch documents this remote sort order requirement. > > That seems like quite a random place to insert the documentation. > I doubt that the requirement applies only to upper paths -- FDWs > are allowed to return sorted paths for base relations too, > comparable to indexscan paths. That is the only section of the docs that mentions sort pushdown. I don't see another reasonable location. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Wed, Sep 27, 2023 at 10:27:34PM -0400, Bruce Momjian wrote: > On Wed, Sep 27, 2023 at 09:06:13PM -0400, Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > > > The attached patch documents this remote sort order requirement. > > > > That seems like quite a random place to insert the documentation. > > I doubt that the requirement applies only to upper paths -- FDWs > > are allowed to return sorted paths for base relations too, > > comparable to indexscan paths. > > That is the only section of the docs that mentions sort pushdown. > I don't see another reasonable location. I now think the fdw docs are the wrong place for this since it is a user requirement, not a developer one. I put it in CREATE SERVER, patch attached. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Attachment
On Thu, Sep 28, 2023 at 09:48:02AM -0400, Bruce Momjian wrote: > On Wed, Sep 27, 2023 at 10:27:34PM -0400, Bruce Momjian wrote: > > On Wed, Sep 27, 2023 at 09:06:13PM -0400, Tom Lane wrote: > > > Bruce Momjian <bruce@momjian.us> writes: > > > > The attached patch documents this remote sort order requirement. > > > > > > That seems like quite a random place to insert the documentation. > > > I doubt that the requirement applies only to upper paths -- FDWs > > > are allowed to return sorted paths for base relations too, > > > comparable to indexscan paths. > > > > That is the only section of the docs that mentions sort pushdown. > > I don't see another reasonable location. > > I now think the fdw docs are the wrong place for this since it is a user > requirement, not a developer one. I put it in CREATE SERVER, patch > attached. Patch applied back to PG 11. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.