Thread: Query execution failure

Query execution failure

From
Pete Storer
Date:

I am encountering a strange error when trying to execute a complex SELECT query in Postgres. All of the details of this error are documented in the attachment.

 

I hope you can help me to resolve this error.

 

Thanks very much.

 

Pete Storer

Sr Data Architect

Enterprise Performance and Data Architecture - EPD

Tel: + 1 919 531 5745

SAS | 100 Campus Drive | Cary, NC 27513

 

DID YOU KNOW?
SAS IS THE NO. 1 AI AND ADVANCED ANALYTICS SOFTWARE PLATFORM.

 

 

Attachment

Re: Query execution failure

From
Christophe Pettus
Date:

> On Jan 25, 2023, at 11:58, Pete Storer <Pete.Storer@sas.com> wrote:
>
> I am encountering a strange error when trying to execute a complex SELECT query in Postgres. All of the details of
thiserror are documented in the attachment. 

Without opining on the bug itself, you will likely get a much better response if you put the details in plain text,
ratherthan as a .docx attachment.  Many systems strip attachments, and many more people have a (justifiable) allergy to
openingattachments from an unknown source. 


FW: Query execution failure

From
Pete Storer
Date:

Resending this with all the text in the body of the email rather than as a Word attachment.

 

 

PostgreSQL Bug Report

Overview

When trying to run a complex join of eight tables – including 6 foreign tables – the query runs for a few seconds and then terminates with an error message

This occurs ONLY in the Dev environment; the query executes successfully in our Production environment, although it returns zero rows.

The query

Here is the query that is causing the problem:

SELECT

dk.deployment_id,

dk.deployment_key_txt,

dp.name as deployment_nm,

dp.tenant as deployment_tenant_nm,

infc.name as deployment_cluster_nm,

icv.description as cluster_kubernetes_version_id,

infc.nodes as cluster_node_no,

infc.ram as cluster_ram_no,

infc.cpu as cluster_cpu_no,

infc.disk as cluster_disk_no,

dp.deploy_description as infrastructure_desc,

CASE

    WHEN infrq.multi_master = 1 THEN 'Y'

    else 'N'

END

AS HA_multi_master_flg,

infrq.cluster_label as cluster_environment_desc,

dl.deploy_time as deployment_deploy_tm,

dl.order_number as order_no_id,

ob.order_id as order_id,

dp.server as deployment_host_nm,

to_timestamp(CAST(dl."timestamp" AS double precision)) as deployment_created_dttm,

dl.promotion_stage as promotion_stage_cd,

se.ship_event_nm as ship_event_nm,

'UDANEXT' as source_system_cd

FROM foreign_udanext.deployments dp

LEFT OUTER JOIN foreign_udanext.deployment_log dl ON (dl.deploy_id = dp.id)

LEFT OUTER JOIN foreign_udanext.infrastructure_tenant inft ON (inft.name=dp.tenant)

JOIN cqm_meta.deployment_key dk ON (concat_ws('|',dl.deploy_id, dl.task_id,'UDANEXT') = dk.deployment_key_txt)

-- JOIN sel_meta_key dk ON concat_ws('|',dl.deploy_id, dl.task_id,'UDANEXT') = dk.deployment_key_txt

LEFT OUTER JOIN foreign_udanext.infrastructure_cluster infc ON (infc.tenant_id=inft.id)

LEFT OUTER JOIN foreign_udanext.infrastructure_cluster_request infrq ON (infrq.id=infc.request_id)

LEFT OUTER JOIN foreign_udanext.infrastructure_cluster_version icv ON (infc.k8s_version_id=icv.id)

LEFT OUTER JOIN main.ship_event se ON (se.ship_event_nm = substring(dp.shipevent from 6 for 5))

inner join main.order_base ob on ob.order_no = dl.order_number

WHERE dk.deployment_key_txt NOT IN

     (SELECT source_system_deployment_id

     FROM main.deployment_base db)

 

The intent is to use this SELECT within an INSERT statement that will add the result data into a table, but first we have to have the SELECT work appropriately.

 

The error

ERROR: mergejoin input data is out of order

SQL state: XX000

This error message is not documented anywhere that I can find, and it only occurs in Dev, not Prod.

Suspected issue

We believe that this error is caused by one of two conditions:

  1. Postgres is assuming an incorrect sort order on data returned from a substring function; and/or
  2. The number of joins is causing intermediate data to be cached, and available cache memory is being exhausted.

Reasons for error suspicions

For (1), the error is suspected because if the line containing the substring is commented out, along with it’s associated join, the query runs successfully.

Substring:

LEFT OUTER JOIN main.ship_event se ON (se.ship_event_nm = substring(dp.shipevent from 6 for 5))

Query parameter (part of select):

se.ship_event_nm as ship_event_nm,

The reason we suspect this is first, because the query runs if this join and column selected is commented out of the query. Second, the error message, “mergejoin input data is out of order”, leads us to suspect that it is expecting the shipevent data to be in a particular order, and it is not because it is a substring that begins with the 6th character of the string in the column.

What argues against this being the issue is that fact that the query executes properly in Prod, which is running on the same version of Postgres as Dev.

For error (2), the suspicion is based on the log output. The full log will be attached to this bug report, but here’s the relevant subset of the log:

 

 

       SELECT

       dk.deployment_id,

       dk.deployment_key_txt,

       dp.name as deployment_nm,

       dp.tenant as deployment_tenant_nm,

       infc.name as deployment_cluster_nm,

       icv.description as cluster_kubernetes_version_id,

       infc.nodes as cluster_node_no,

       infc.ram as cluster_ram_no,

       infc.cpu as cluster_cpu_no,

       infc.disk as cluster_disk_no,

       dp.deploy_description as infrastructure_desc,

       CASE

           WHEN infrq.multi_master = 1 THEN 'Y'

           else 'N'

       END

       AS HA_multi_master_flg,

       infrq.cluster_label as cluster_environment_desc,

       dl.deploy_time as deployment_deploy_tm,

       dl.order_number as order_no_id,

       ob.order_id as order_id,

       dp.server as deployment_host_nm,

       to_timestamp(CAST(dl."timestamp" AS double precision)) as deployment_created_dttm,

       dl.promotion_stage as promotion_stage_cd,

       se.ship_event_nm as ship_event_nm,

       'UDANEXT' as source_system_cd

       FROM foreign_udanext.deployments dp

       LEFT OUTER JOIN foreign_udanext.deployment_log dl ON (dl.deploy_id = dp.id)

       LEFT OUTER JOIN foreign_udanext.infrastructure_tenant inft ON (inft.name=dp.tenant)

       JOIN cqm_meta.deployment_key dk ON (concat_ws('|',dl.deploy_id, dl.task_id,'UDANEXT') = dk.deployment_key_txt)

       -- JOIN sel_meta_key dk ON concat_ws('|',dl.deploy_id, dl.task_id,'UDANEXT') = dk.deployment_key_txt

       LEFT OUTER JOIN foreign_udanext.infrastructure_cluster infc ON (infc.tenant_id=inft.id)

       LEFT OUTER JOIN foreign_udanext.infrastructure_cluster_request infrq ON (infrq.id=infc.request_id)

       LEFT OUTER JOIN foreign_udanext.infrastructure_cluster_version icv ON (infc.k8s_version_id=icv.id)

       LEFT OUTER JOIN main.ship_event se ON (se.ship_event_nm = substring(dp.shipevent from 6 for 5))

       inner join main.order_base ob on ob.order_no = dl.order_number

       WHERE dk.deployment_key_txt NOT IN

            (SELECT source_system_deployment_id

            FROM main.deployment_base db)

      

2023-01-25 09:08:46 EST [354873]: user=pestor,db=cqm_dev,app=pgAdmin 4 - CONN:5168871,client=172.16.47.57 LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp354873.0", size 24272896

2023-01-25 09:08:46 EST [354873]: user=pestor,db=cqm_dev,app=pgAdmin 4 - CONN:5168871,client=172.16.47.57 LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp354873.5", size 0

2023-01-25 09:08:46 EST [354873]: user=pestor,db=cqm_dev,app=pgAdmin 4 - CONN:5168871,client=172.16.47.57 LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp354873.4", size 0

2023-01-25 09:08:46 EST [354873]: user=pestor,db=cqm_dev,app=pgAdmin 4 - CONN:5168871,client=172.16.47.57 LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp354873.3", size 0

2023-01-25 09:08:46 EST [354873]: user=pestor,db=cqm_dev,app=pgAdmin 4 - CONN:5168871,client=172.16.47.57 LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp354873.2", size 24231936

2023-01-25 09:08:46 EST [354873]: user=pestor,db=cqm_dev,app=pgAdmin 4 - CONN:5168871,client=172.16.47.57 LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp354873.1", size 24305664

2023-01-25 09:08:55 EST [354911]: user=,db=,app=,client= LOG:  automatic vacuum of table "cqm_metabase.public.qrtz_scheduler_state": index scans: 0

       pages: 0 removed, 1 remain, 0 skipped due to pins, 0 skipped frozen

       tuples: 56 removed, 1 remain, 0 are dead but not yet removable, oldest xmin: 53130667

       index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed

       avg read rate: 0.000 MB/s, avg write rate: 18.382 MB/s

       buffer usage: 52 hits, 0 misses, 1 dirtied

       WAL usage: 3 records, 1 full page images, 8671 bytes

       system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

 

What is significant here is that a number of temp files are created with intermediate results. Three temp files of none-zero size are noted, each of which is slightly over 24 MB in size. It could be that, as a result of all this caching, we are running out of cache/swap space on the server which is causing the failure (albeit with an EXTREMELY misleading error message). As you can see below, the two environments are very similar in architecture except for number of CPUs.

Below are the stats for the two machines, dev and prod.

Dev environment

$ df

Filesystem                                        1K-blocks       Used  Available Use% Mounted on

devtmpfs                                           32810860          0   32810860   0% /dev

tmpfs                                              32829716       1308   32828408   1% /dev/shm

tmpfs                                              32829716       1016   32828700   1% /run

tmpfs                                              32829716          0   32829716   0% /sys/fs/cgroup

/dev/mapper/vg_default-lv_root                     69094696   18072504   47830792  28% /

/dev/sda1                                            499656     347104     115856  75% /boot

/dev/mapper/vg_default-lv_var                      16382844    1684248   13843352  11% /var

/dev/mapper/vg_default-lv_tmp                       5095040      23452    4793060   1% /tmp

/dev/mapper/vg_data-lv_data                      2604006408 1295117116 1308872908  50% /var/lib/pgsql

regxbc02.unx.sas.com:/opt/sysadm                  206288896   91794432  103992320  47% /opt/sysadm

isilon03nfs.unx.sas.com:/ifs/nosnaps/cqm_backups 1468006400 1182180864  285825536  81% /var/cqm_backups

isilon03.unx.sas.com:/ifs/data/cqmtest_backups   1073741824 1073741824          0 100% /var/cqmtest_backups

tmpfs                                               6565940          0    6565940   0% /run/user/3595

tmpfs                                               6565940          0    6565940   0% /run/user/10639

 

cat /proc/meminfo

MemTotal:       65659436 kB

MemFree:        17569556 kB

MemAvailable:   47897876 kB

Buffers:          248268 kB

Cached:         45506792 kB

SwapCached:        14000 kB

Active:         28747392 kB

Inactive:       17297648 kB

Active(anon):   16033288 kB

Inactive(anon):   266548 kB

Active(file):   12714104 kB

Inactive(file): 17031100 kB

Unevictable:           0 kB

Mlocked:               0 kB

SwapTotal:       4194300 kB

SwapFree:        3939324 kB

Dirty:                52 kB

Writeback:             0 kB

AnonPages:        280560 kB

Mapped:         16046604 kB

Shmem:          16009848 kB

KReclaimable:    1309076 kB

Slab:            1543072 kB

SReclaimable:    1309076 kB

SUnreclaim:       233996 kB

KernelStack:        6432 kB

PageTables:       225972 kB

NFS_Unstable:          0 kB

Bounce:                0 kB

WritebackTmp:          0 kB

CommitLimit:    37024016 kB

Committed_AS:   17046424 kB

VmallocTotal:   34359738367 kB

VmallocUsed:           0 kB

VmallocChunk:          0 kB

Percpu:           107008 kB

HardwareCorrupted:     0 kB

AnonHugePages:    159744 kB

ShmemHugePages:        0 kB

ShmemPmdMapped:        0 kB

FileHugePages:         0 kB

FilePmdMapped:         0 kB

HugePages_Total:       0

HugePages_Free:        0

HugePages_Rsvd:        0

HugePages_Surp:        0

Hugepagesize:       2048 kB

Hugetlb:               0 kB

DirectMap4k:      542528 kB

DirectMap2M:    50837504 kB

DirectMap1G:    17825792 kB

 

$ lscpu | egrep 'Model name|Socket|Thread|NUMA|CPU\(s\)'

CPU(s):              4

On-line CPU(s) list: 0-3

Thread(s) per core:  1

Socket(s):           4

NUMA node(s):        1

Model name:          Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz

NUMA node0 CPU(s):   0-3

 

select version()

PostgreSQL 14.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit

 

Prod environment

$ df

Filesystem                                        1K-blocks       Used Available Use% Mounted on

devtmpfs                                           65840984          0  65840984   0% /dev

tmpfs                                              65859840      58564  65801276   1% /dev/shm

tmpfs                                              65859840       1132  65858708   1% /run

tmpfs                                              65859840          0  65859840   0% /sys/fs/cgroup

/dev/mapper/vg_default-lv_root                     69094696    4753588  61149708   8% /

/dev/sda1                                            499656     347064    115896  75% /boot

/dev/mapper/vg_default-lv_var                      16382844    3419088  12108512  23% /var

/dev/mapper/vg_default-lv_tmp                       5095040      20492   4796020   1% /tmp

regxbc02.unx.sas.com:/opt/sysadm                  206288896   91794432 103992320  47% /opt/sysadm

tmpfs                                              13171968          0  13171968   0% /run/user/0

isilon03nfs.unx.sas.com:/ifs/nosnaps/cqm_backups 1468006400 1182180864 285825536  81% /var/cqm_backups

/dev/drbd0                                       1878122868  901901912 976204572  49% /var/lib/pgsql

tmpfs                                              13171968          0  13171968   0% /run/user/3595

tmpfs                                              13171968          0  13171968   0% /run/user/10639

 

$ cat /proc/meminfo

MemTotal:       131719684 kB

MemFree:         1893884 kB

MemAvailable:   96026960 kB

Buffers:           55028 kB

Cached:         125785868 kB

SwapCached:        20956 kB

Active:         62095452 kB

Inactive:       64324096 kB

Active(anon):   32042800 kB

Inactive(anon):   565988 kB

Active(file):   30052652 kB

Inactive(file): 63758108 kB

Unevictable:      165980 kB

Mlocked:          165980 kB

SwapTotal:       4194300 kB

SwapFree:        3481264 kB

Dirty:              2056 kB

Writeback:             0 kB

AnonPages:        734624 kB

Mapped:         31501228 kB

Shmem:          32014476 kB

KReclaimable:    1574668 kB

Slab:            1860200 kB

SReclaimable:    1574668 kB

SUnreclaim:       285532 kB

KernelStack:        9168 kB

PageTables:       738928 kB

NFS_Unstable:          0 kB

Bounce:                0 kB

WritebackTmp:          0 kB

CommitLimit:    70054140 kB

Committed_AS:   34301304 kB

VmallocTotal:   34359738367 kB

VmallocUsed:           0 kB

VmallocChunk:          0 kB

Percpu:           158208 kB

HardwareCorrupted:     0 kB

AnonHugePages:    376832 kB

ShmemHugePages:        0 kB

ShmemPmdMapped:        0 kB

FileHugePages:         0 kB

FilePmdMapped:         0 kB

HugePages_Total:       0

HugePages_Free:        0

HugePages_Rsvd:        0

HugePages_Surp:        0

Hugepagesize:       2048 kB

Hugetlb:               0 kB

DirectMap4k:     2967360 kB

DirectMap2M:    92452864 kB

DirectMap1G:    40894464 kB

 

$ lscpu | egrep 'Model name|Socket|Thread|NUMA|CPU\(s\)'

CPU(s):              16

On-line CPU(s) list: 0-15

Thread(s) per core:  1

Socket(s):           16

NUMA node(s):        1

Model name:          Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz

NUMA node0 CPU(s):   0-15

 

Select version()

PostgreSQL 14.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit

 

 

 

 

From: Pete Storer
Sent: Wednesday, January 25, 2023 2:59 PM
To: 'pgsql-bugs@lists.postgresql.org' <pgsql-bugs@lists.postgresql.org>
Subject: Query execution failure

 

I am encountering a strange error when trying to execute a complex SELECT query in Postgres. All of the details of this error are documented in the attachment.

 

I hope you can help me to resolve this error.

 

Thanks very much.

 

Pete Storer

Sr Data Architect

Enterprise Performance and Data Architecture - EPD

Tel: + 1 919 531 5745

SAS | 100 Campus Drive | Cary, NC 27513

 

DID YOU KNOW?
SAS IS THE NO. 1 AI AND ADVANCED ANALYTICS SOFTWARE PLATFORM.

 

 

Attachment

Re: FW: Query execution failure

From
Tom Lane
Date:
Pete Storer <Pete.Storer@sas.com> writes:
> When trying to run a complex join of eight tables - including 6 foreign tables - the query runs for a few seconds and
thenterminates with an error message 
> ERROR: mergejoin input data is out of order

You have not shown us the table definitions, nor an EXPLAIN for the
query, but this'd indicate that the sort ordering of one or more
tables is not like that of the others.  Likely causes of this include

* corrupt index, if an indexscan is being used instead of an explicit
  sort

* foreign server has a different idea of the locale's sort details than
  the local server does (or than some other foreign server, if the
  query touches more than one)

If the join key columns are all numeric then it's almost surely a corrupt
index somewhere.  But if you are joining on string keys then you are
subject to the whims of the locale's sort order, and that is a mess,
particularly for non-ASCII data.  Not only might it be different on
different platforms, but even on the same platform it can change over
time (with the effect of making indexes on string columns corrupt).
There's some details and advice here:

https://wiki.postgresql.org/wiki/Locale_data_changes

            regards, tom lane



RE: FW: Query execution failure

From
Pete Storer
Date:
Thanks for the quick response, Tom.

To clarify, the DDL for the two local tables and the primary foreign table are below. I am also including the EXPLAIN
VERBOSEoutput for this query. 

I'm not sure how to identify which index may be corrupt, if that is the cause. Regarding a different sort order between
tables,if that were the case, wouldn't this query fail in Production as it does in our Development environment? They
areidentical with respect to indices and sort order, and they both point to the same foreign table, which resides on a
MySQLdatabase. 

Let me know if this additional information helps. Thanks.

__________________________________________

CREATE TABLE IF NOT EXISTS main.ship_event
(
    ship_event_nm character varying(20) COLLATE pg_catalog."default" NOT NULL,
    available_for_ship_dt date,
    last_build_dt date,
    sign_off_dt date,
    commit_to_ship_dt date,
    l10n_file_freeze_dt date,
    dev_test_infra_ready_dt date,
    builds_stop_dt date,
    localization_ready_dt date,
    source_system_cd character varying(20) COLLATE pg_catalog."default" NOT NULL,
    updated_dttm timestamp with time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_by_id character varying(50) COLLATE pg_catalog."default" NOT NULL DEFAULT CURRENT_USER,
    CONSTRAINT ship_event_pk PRIMARY KEY (ship_event_nm),
    CONSTRAINT ship_event_fk01 FOREIGN KEY (source_system_cd)
        REFERENCES cqm_meta.source_system (source_system_cd) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS main.ship_event
    OWNER to cqmadmin;

CREATE TABLE IF NOT EXISTS main.deployment_base
(
    deployment_id integer NOT NULL DEFAULT nextval('main.deployment_base_deployment_id_seq'::regclass),
    source_system_deployment_id text COLLATE pg_catalog."default" NOT NULL,
    deployment_nm character varying(100) COLLATE pg_catalog."default",
    deployment_tenant_nm character varying(100) COLLATE pg_catalog."default",
    deployment_cluster_nm character varying(100) COLLATE pg_catalog."default",
    cluster_kubernetes_version_id character varying(100) COLLATE pg_catalog."default",
    cluster_node_no smallint,
    cluster_ram_no smallint,
    cluster_cpu_no smallint,
    cluster_disk_no smallint,
    infrastructure_desc text COLLATE pg_catalog."default",
    ha_multi_master_flg character(1) COLLATE pg_catalog."default",
    cluster_environment_desc character varying(100) COLLATE pg_catalog."default",
    build_level_cd character varying(50) COLLATE pg_catalog."default",
    deployment_deploy_tm integer,
    deployment_ulid_cd character varying(120) COLLATE pg_catalog."default",
    order_id integer,
    k8s_namespace_nm character varying(100) COLLATE pg_catalog."default",
    k8s_kubeconfig_url_txt character varying(200) COLLATE pg_catalog."default",
    k8s_kubeconfig_data jsonb,
    k8s_failed_pods_cnt integer,
    k8s_client_version_no character varying(20) COLLATE pg_catalog."default",
    k8s_server_version_no character varying(20) COLLATE pg_catalog."default",
    k8s_cluster_type_cd character varying(20) COLLATE pg_catalog."default",
    k8s_ingress_controller_type_cd character varying(20) COLLATE pg_catalog."default",
    k8s_os_type_nm character varying(20) COLLATE pg_catalog."default",
    k8s_os_version_no character varying(20) COLLATE pg_catalog."default",
    deployment_host_nm character varying(120) COLLATE pg_catalog."default",
    promotion_stage_cd character varying(120) COLLATE pg_catalog."default",
    ship_event_nm character varying(20) COLLATE pg_catalog."default",
    deployment_uuid_cd character varying(120) COLLATE pg_catalog."default",
    manifest_commit_id character varying(120) COLLATE pg_catalog."default",
    orderable_ary_nm character varying[] COLLATE pg_catalog."default",
    deployment_deploy_dttm timestamp with time zone,
    source_system_cd character varying(20) COLLATE pg_catalog."default",
    cruise_worker_id character varying(50) COLLATE pg_catalog."default",
    cluster_info_json_doc jsonb,
    cas_ready_cd character varying(10) COLLATE pg_catalog."default",
    all_pods_alive_flg boolean,
    oauth_ready_flg boolean,
    deployment_label character varying(50) COLLATE pg_catalog."default",
    sas_conn_ready_flg boolean,
    sas_login_ready_flg boolean,
    postgres_version_no character varying(20) COLLATE pg_catalog."default",
    postgres_type_cd character varying(10) COLLATE pg_catalog."default",
    cadence_id integer,
    deployment_test_flg boolean,
    deployment_type_nm character varying(20) COLLATE pg_catalog."default",
    cloud_provider_nm character varying(50) COLLATE pg_catalog."default",
    storage_type_nm character varying(50) COLLATE pg_catalog."default",
    sas_base_url_txt character varying(200) COLLATE pg_catalog."default",
    embedded_ldap_flg boolean,
    cas_host_nm character varying(100) COLLATE pg_catalog."default",
    cas_mode_cd character varying(10) COLLATE pg_catalog."default",
    cas_port_no character varying(10) COLLATE pg_catalog."default",
    cas_battery_results_cd character varying(10) COLLATE pg_catalog."default",
    cas_server_type_cd character varying(10) COLLATE pg_catalog."default",
    cas_backup_flg boolean,
    node_instance_type character varying(100) COLLATE pg_catalog."default",
    viya4_ark_json_doc jsonb,
    performance_json_doc jsonb,
    astrolabe_json_doc jsonb,
    viya4_env_valid_json_doc jsonb,
    updated_dttm timestamp with time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_by_id character varying(50) COLLATE pg_catalog."default" NOT NULL DEFAULT CURRENT_USER,
    CONSTRAINT deployment_base_pk PRIMARY KEY (deployment_id),
    CONSTRAINT deployment_base_ux01 UNIQUE (source_system_deployment_id, source_system_cd),
    CONSTRAINT deployment_base_fk01 FOREIGN KEY (ship_event_nm)
        REFERENCES main.ship_event (ship_event_nm) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT deployment_base_fk02 FOREIGN KEY (source_system_cd)
        REFERENCES cqm_meta.source_system (source_system_cd) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT deployment_base_fk03 FOREIGN KEY (order_id)
        REFERENCES main.order_base (order_id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
)

TABLESPACE pg_default;

Query explain

QUERY PLAN
Hash Join  (cost=182208.88..4862218.44 rows=10656250 width=1555)
  Output: dk.deployment_id, dk.deployment_key_txt, dp.name, dp.tenant, infc.name, icv.description, infc.nodes,
infc.ram,infc.cpu, infc.disk, dp.deploy_description, CASE WHEN (infrq.multi_master = 1) THEN 'Y'::text ELSE 'N'::text
END,infrq.cluster_label, dl.deploy_time, dl.order_number, ob.order_id, dp.server, to_timestamp((dl."timestamp")::double
precision),dl.promotion_stage, se.ship_event_nm, 'UDANEXT'::text 
  Hash Cond: ((dl.order_number)::text = (ob.order_no)::text)
  ->  Merge Left Join  (cost=14682.01..174631.01 rows=10656250 width=1599)
        Output: dp.name, dp.tenant, dp.deploy_description, dp.server, dl.deploy_time, dl.order_number, dl."timestamp",
dl.promotion_stage,dk.deployment_id, dk.deployment_key_txt, infc.name, infc.nodes, infc.ram, infc.cpu, infc.disk,
infrq.multi_master,infrq.cluster_label, icv.description, se.ship_event_nm 
        Merge Cond: ((dp.tenant)::text = (inft.name)::text)
        ->  Sort  (cost=14667.01..14709.63 rows=17050 width=927)
              Output: dp.name, dp.tenant, dp.deploy_description, dp.server, dl.deploy_time, dl.order_number,
dl."timestamp",dl.promotion_stage, dk.deployment_id, dk.deployment_key_txt, se.ship_event_nm 
              Sort Key: dp.tenant
              ->  Merge Join  (cost=13165.12..13468.61 rows=17050 width=927)
                    Output: dp.name, dp.tenant, dp.deploy_description, dp.server, dl.deploy_time, dl.order_number,
dl."timestamp",dl.promotion_stage, dk.deployment_id, dk.deployment_key_txt, se.ship_event_nm 
                    Merge Cond: (dk.deployment_key_txt = (concat_ws('|'::text, dl.deploy_id, dl.task_id, 'UDANEXT')))
                    ->  Sort  (cost=12778.41..12780.12 rows=682 width=36)
                          Output: dk.deployment_id, dk.deployment_key_txt
                          Sort Key: dk.deployment_key_txt
                          ->  Foreign Scan on cqm_meta.deployment_key dk  (cost=12691.97..12746.31 rows=682 width=36)
                                Output: dk.deployment_id, dk.deployment_key_txt
                                Filter: (NOT (hashed SubPlan 1))
                                Remote SQL: SELECT deployment_id, deployment_key_txt FROM cqm_meta.deployment_key
                                SubPlan 1
                                  ->  Index Only Scan using deployment_base_ux01 on main.deployment_base db
(cost=0.42..11840.60rows=300546 width=20) 
                                        Output: db.source_system_deployment_id
                    ->  Sort  (cost=386.70..399.20 rows=5000 width=899)
                          Output: dp.name, dp.tenant, dp.deploy_description, dp.server, dl.deploy_time,
dl.order_number,dl."timestamp", dl.promotion_stage, dl.deploy_id, dl.task_id, se.ship_event_nm, (concat_ws('|'::text,
dl.deploy_id,dl.task_id, 'UDANEXT')) 
                          Sort Key: (concat_ws('|'::text, dl.deploy_id, dl.task_id, 'UDANEXT'))
                          ->  Hash Left Join  (cost=45.69..79.51 rows=5000 width=899)
                                Output: dp.name, dp.tenant, dp.deploy_description, dp.server, dl.deploy_time,
dl.order_number,dl."timestamp", dl.promotion_stage, dl.deploy_id, dl.task_id, se.ship_event_nm, concat_ws('|'::text,
dl.deploy_id,dl.task_id, 'UDANEXT') 
                                Inner Unique: true
                                Hash Cond: (SUBSTRING(dp.shipevent FROM 6 FOR 5) = (se.ship_event_nm)::text)
                                ->  Foreign Scan  (cost=15.00..35.00 rows=5000 width=960)
                                      Output: dp.name, dp.tenant, dp.deploy_description, dp.server, dp.shipevent,
dl.deploy_time,dl.order_number, dl."timestamp", dl.promotion_stage, dl.deploy_id, dl.task_id 
                                      Relations: (udanext.deployments dp) INNER JOIN (udanext.deployment_log dl)
                                      Remote server startup cost: 25
                                      Remote query: SELECT r1.`name`, r1.`tenant`, r1.`deploy_description`,
r1.`server`,r1.`shipevent`, r2.`deploy_time`, r2.`order_number`, r2.`timestamp`, r2.`promotion_stage`, r2.`deploy_id`,
r2.`task_id`FROM (`udanext`.`deployments` r1 INNER JOIN `udanext`.`deployment_log` r2 ON (((r1.`id` =
r2.`deploy_id`))))ORDER BY r2.`order_number` IS NULL, r2.`order_number` ASC 
                                ->  Hash  (cost=19.75..19.75 rows=875 width=7)
                                      Output: se.ship_event_nm
                                      ->  Seq Scan on main.ship_event se  (cost=0.00..19.75 rows=875 width=7)
                                            Output: se.ship_event_nm
        ->  Materialize  (cost=15.00..347.50 rows=125000 width=890)
              Output: inft.name, infc.name, infc.nodes, infc.ram, infc.cpu, infc.disk, infrq.multi_master,
infrq.cluster_label,icv.description 
              ->  Foreign Scan  (cost=15.00..35.00 rows=125000 width=890)
                    Output: inft.name, infc.name, infc.nodes, infc.ram, infc.cpu, infc.disk, infrq.multi_master,
infrq.cluster_label,icv.description 
                    Relations: (((udanext.infrastructure_tenant inft) LEFT JOIN (udanext.infrastructure_cluster infc))
LEFTJOIN (udanext.infrastructure_cluster_request infrq)) LEFT JOIN (udanext.infrastructure_cluster_version icv) 
                    Remote server startup cost: 25
                    Remote query: SELECT r4.`name`, r8.`name`, r8.`nodes`, r8.`ram`, r8.`cpu`, r8.`disk`,
r10.`multi_master`,r10.`cluster_label`, r12.`description` FROM (((`udanext`.`infrastructure_tenant` r4 LEFT JOIN
`udanext`.`infrastructure_cluster`r8 ON (((r8.`tenant_id` = r4.`id`)))) LEFT JOIN
`udanext`.`infrastructure_cluster_request`r10 ON (((r10.`id` = r8.`request_id`)))) LEFT JOIN
`udanext`.`infrastructure_cluster_version`r12 ON (((r8.`k8s_version_id` = r12.`id`)))) ORDER BY r4.`name` IS NULL,
r4.`name`ASC 
  ->  Hash  (cost=113073.72..113073.72 rows=3132572 width=11)
        Output: ob.order_id, ob.order_no
        ->  Seq Scan on main.order_base ob  (cost=0.00..113073.72 rows=3132572 width=11)
              Output: ob.order_id, ob.order_no
Query Identifier: -8672584825060546873


CREATE FOREIGN TABLE IF NOT EXISTS foreign_udanext.deployments(
    id integer NOT NULL,
    "timestamp" character varying(50) NULL COLLATE pg_catalog."default",
    status character varying(15) NULL COLLATE pg_catalog."default",
    prod_code character varying(10) NULL COLLATE pg_catalog."default",
    shipevent character varying(25) NULL COLLATE pg_catalog."default",
    user_id character varying(15) NULL COLLATE pg_catalog."default",
    name character varying(100) NULL COLLATE pg_catalog."default",
    env_type character varying(25) NULL COLLATE pg_catalog."default",
    tenant character varying(100) NULL COLLATE pg_catalog."default",
    deploy_type character varying(25) NULL COLLATE pg_catalog."default",
    host_code character varying(5) NULL COLLATE pg_catalog."default",
    cas_controller character varying(50) NULL COLLATE pg_catalog."default",
    contacts character varying(400) NULL COLLATE pg_catalog."default",
    deploy_description text NULL COLLATE pg_catalog."default",
    server character varying(100) NULL COLLATE pg_catalog."default",
    vm_instance_name character varying(50) NULL COLLATE pg_catalog."default",
    vm_instance_size character varying(25) NULL COLLATE pg_catalog."default",
    race_base_image character varying(25) NULL COLLATE pg_catalog."default",
    race_gold_image character varying(25) NULL COLLATE pg_catalog."default",
    race_label character varying(50) NULL COLLATE pg_catalog."default",
    load_image smallint NULL,
    vm_instance_os character varying(200) NULL COLLATE pg_catalog."default",
    topology character varying(25) NULL COLLATE pg_catalog."default",
    tenant_region character varying(25) NULL COLLATE pg_catalog."default",
    vm_volume_size integer NULL,
    multi_nic_enabled smallint NULL,
    service_mesh_type integer NOT NULL,
    orchestration_format_id integer NULL,
    database_type integer NOT NULL,
    cas_id integer NULL
)
    SERVER udanext_prod_server
    OPTIONS (dbname 'udanext', table_name 'deployments');


-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Thursday, January 26, 2023 10:24 AM
To: Pete Storer <Pete.Storer@sas.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: FW: Query execution failure

[You don't often get email from tgl@sss.pgh.pa.us. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification] 

EXTERNAL

Pete Storer <Pete.Storer@sas.com> writes:
> When trying to run a complex join of eight tables - including 6
> foreign tables - the query runs for a few seconds and then terminates
> with an error message
> ERROR: mergejoin input data is out of order

You have not shown us the table definitions, nor an EXPLAIN for the query, but this'd indicate that the sort ordering
ofone or more tables is not like that of the others.  Likely causes of this include 

* corrupt index, if an indexscan is being used instead of an explicit
  sort

* foreign server has a different idea of the locale's sort details than
  the local server does (or than some other foreign server, if the
  query touches more than one)

If the join key columns are all numeric then it's almost surely a corrupt index somewhere.  But if you are joining on
stringkeys then you are subject to the whims of the locale's sort order, and that is a mess, particularly for non-ASCII
data. Not only might it be different on different platforms, but even on the same platform it can change over time
(withthe effect of making indexes on string columns corrupt). 
There's some details and advice here:


https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.postgresql.org%2Fwiki%2FLocale_data_changes&data=05%7C01%7CPete.Storer%40sas.com%7C69ba399db52343407f0c08daffb15836%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C638103434451258989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7nBPCQY16PdlQy1tmPnURFk3LMSL%2FosNsrZhM0eEwvo%3D&reserved=0

                        regards, tom lane



Re: FW: Query execution failure

From
Tom Lane
Date:
Pete Storer <Pete.Storer@sas.com> writes:
> I'm not sure how to identify which index may be corrupt, if that is the cause. Regarding a different sort order
betweentables, if that were the case, wouldn't this query fail in Production as it does in our Development environment?
Theyare identical with respect to indices and sort order, and they both point to the same foreign table, which resides
ona MySQL database. 

Meh.  You've just got two merge joins in that plan, and three of the
four inputs to them are being explicitly sorted, so one would certainly
hope that those sorts are consistent.  But the fourth input is being
sourced directly from the foreign server, and so this query is
fundamentally assuming that the foreign server has the same idea of
sort ordering as the local one.  Evidently that assumption is wrong.

I'd check locale settings on your two databases to understand why
it doesn't fail on prod.  But the long-term solution if you want this
to work reliably is probably to force use of C locale on all three
databases; that's about the only locale that you can fully trust
to be portable.

I don't know anything about the mysql FDW, but another possible route
to fixing things is to get it to not believe that the remote's sort
ordering matches the local one.  If the plan were relying on a local
sort instead of a remote sort then everything would be fine.

            regards, tom lane



Re: FW: Query execution failure

From
Bruce Momjian
Date:
On Thu, Jan 26, 2023 at 11:06:41AM -0500, Tom Lane wrote:
> I don't know anything about the mysql FDW, but another possible route
> to fixing things is to get it to not believe that the remote's sort
> ordering matches the local one.  If the plan were relying on a local
> sort instead of a remote sort then everything would be fine.

When we implemented sort pushdown to FDWs I had not considered how hard
it might be to match sort ordering, especially considering the problem
we have in matching orderings from different operating system versions.

Is this documented somewhere?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.



Re: FW: Query execution failure

From
Joe Conway
Date:
On 1/30/23 10:34, Bruce Momjian wrote:
> On Thu, Jan 26, 2023 at 11:06:41AM -0500, Tom Lane wrote:
>> I don't know anything about the mysql FDW, but another possible route
>> to fixing things is to get it to not believe that the remote's sort
>> ordering matches the local one.  If the plan were relying on a local
>> sort instead of a remote sort then everything would be fine.
> 
> When we implemented sort pushdown to FDWs I had not considered how hard
> it might be to match sort ordering, especially considering the problem
> we have in matching orderings from different operating system versions.

Different sort ordering could be due to all kinds of things, including 
for example a RHEL 8 instance pulling from a RHEL 7 one (glibc mismatch).

Also I have been told (not verified by me) that MySQL at least supports 
using its own built-in collation. If so, how can we ever know that 
whatever Postgres is using will match that?

I wonder if we should have an attribute of a foreign server that 
indicates whether sort pushdown is enabled or not.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: FW: Query execution failure

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> Also I have been told (not verified by me) that MySQL at least supports 
> using its own built-in collation. If so, how can we ever know that 
> whatever Postgres is using will match that?

Ugh.  mysql_fdw probably ought to default to no-sort-pushdown.

> I wonder if we should have an attribute of a foreign server that 
> indicates whether sort pushdown is enabled or not.

This is something that would have to be implemented individually
by each FDW for which it's relevant.  We could set a good precedent,
and perhaps standardize the option name, by doing that in postgres_fdw.

            regards, tom lane



RE: FW: Query execution failure

From
Pete Storer
Date:
This is very interesting, guys. And it leads me to a question: Is there any way to FORCE Postgres to do a local
sort/collationof any data that is accessed through a FDW? 

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Monday, January 30, 2023 11:30 AM
To: Joe Conway <mail@joeconway.com>
Cc: Bruce Momjian <bruce@momjian.us>; Pete Storer <Pete.Storer@sas.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: FW: Query execution failure

EXTERNAL

Joe Conway <mail@joeconway.com> writes:
> Also I have been told (not verified by me) that MySQL at least
> supports using its own built-in collation. If so, how can we ever know
> that whatever Postgres is using will match that?

Ugh.  mysql_fdw probably ought to default to no-sort-pushdown.

> I wonder if we should have an attribute of a foreign server that
> indicates whether sort pushdown is enabled or not.

This is something that would have to be implemented individually by each FDW for which it's relevant.  We could set a
goodprecedent, and perhaps standardize the option name, by doing that in postgres_fdw. 

                        regards, tom lane



Re: Query execution failure

From
Christophe Pettus
Date:

> On Jan 30, 2023, at 10:17, Pete Storer <Pete.Storer@sas.com> wrote:
>
> This is very interesting, guys. And it leads me to a question: Is there any way to FORCE Postgres to do a local
sort/collationof any data that is accessed through a FDW? 

It depends on the FDW.  One thing to try is to have the SORT be an expression, especially one that depends on functions
thatcan't otherwise be pushed down.  (You may have to do some experimenting to find one that returns identity, i.e. the
samesort order as just the bare column, but isn't pushed down.) 


RE: Query execution failure

From
Pete Storer
Date:
Makes sense - but I'm using the PG substring function here.

LEFT OUTER JOIN main.ship_event se ON (se.ship_event_nm = substring(dp.shipevent from 6 for 5))

In this case, the dp.shipevent is in the fdw-accessed MySql table. Shouldn't that force the sort to be local?

-----Original Message-----
From: Christophe Pettus <xof@thebuild.com>
Sent: Monday, January 30, 2023 1:20 PM
To: Pete Storer <Pete.Storer@sas.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Joe Conway <mail@joeconway.com>; Bruce Momjian <bruce@momjian.us>;
pgsql-bugs@lists.postgresql.org
Subject: Re: Query execution failure

[You don't often get email from xof@thebuild.com. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification] 

EXTERNAL

> On Jan 30, 2023, at 10:17, Pete Storer <Pete.Storer@sas.com> wrote:
>
> This is very interesting, guys. And it leads me to a question: Is there any way to FORCE Postgres to do a local
sort/collationof any data that is accessed through a FDW? 

It depends on the FDW.  One thing to try is to have the SORT be an expression, especially one that depends on functions
thatcan't otherwise be pushed down.  (You may have to do some experimenting to find one that returns identity, i.e. the
samesort order as just the bare column, but isn't pushed down.) 



Re: Query execution failure

From
Joe Conway
Date:
On 1/30/23 13:24, Pete Storer wrote:
> Makes sense - but I'm using the PG substring function here.
> 
> LEFT OUTER JOIN main.ship_event se ON (se.ship_event_nm = substring(dp.shipevent from 6 for 5))
> 
> In this case, the dp.shipevent is in the fdw-accessed MySql table. Shouldn't that force the sort to be local?

Looking at the mysql-fdw source (which looks pretty much the same as 
postgres-fdw at first glance):
8<-------------
/* We don't support cases where there are any SRFs in the targetlist */
if (parse->hasTargetSRFs)
    return;
8<-------------

Based on that, perhaps a kluge (and completely untested) workaround is to:
1/ create an SRF that returns one column, one row constant
2/ add the SRF to your targetlist

The SRF likely needs to be plpgsql to avoid inlining, maybe something like:
8<-------------
CREATE OR REPLACE FUNCTION theanswer() RETURNS setof int as $$
BEGIN
  RETURN NEXT 42;
END;
$$ LANGUAGE plpgsql;
8<-------------

HTH,
-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: FW: Query execution failure

From
Bruce Momjian
Date:
On Mon, Jan 30, 2023 at 10:34:20AM -0500, Bruce Momjian wrote:
> On Thu, Jan 26, 2023 at 11:06:41AM -0500, Tom Lane wrote:
> > I don't know anything about the mysql FDW, but another possible route
> > to fixing things is to get it to not believe that the remote's sort
> > ordering matches the local one.  If the plan were relying on a local
> > sort instead of a remote sort then everything would be fine.
> 
> When we implemented sort pushdown to FDWs I had not considered how hard
> it might be to match sort ordering, especially considering the problem
> we have in matching orderings from different operating system versions.
> 
> Is this documented somewhere?

The attached patch documents this remote sort order requirement.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Attachment

Re: FW: Query execution failure

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> The attached patch documents this remote sort order requirement.

That seems like quite a random place to insert the documentation.
I doubt that the requirement applies only to upper paths -- FDWs
are allowed to return sorted paths for base relations too,
comparable to indexscan paths.

            regards, tom lane



Re: FW: Query execution failure

From
Bruce Momjian
Date:
On Wed, Sep 27, 2023 at 09:06:13PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > The attached patch documents this remote sort order requirement.
> 
> That seems like quite a random place to insert the documentation.
> I doubt that the requirement applies only to upper paths -- FDWs
> are allowed to return sorted paths for base relations too,
> comparable to indexscan paths.

That is the only section of the docs that mentions sort pushdown.
I don't see another reasonable location.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



Re: FW: Query execution failure

From
Bruce Momjian
Date:
On Wed, Sep 27, 2023 at 10:27:34PM -0400, Bruce Momjian wrote:
> On Wed, Sep 27, 2023 at 09:06:13PM -0400, Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > The attached patch documents this remote sort order requirement.
> > 
> > That seems like quite a random place to insert the documentation.
> > I doubt that the requirement applies only to upper paths -- FDWs
> > are allowed to return sorted paths for base relations too,
> > comparable to indexscan paths.
> 
> That is the only section of the docs that mentions sort pushdown.
> I don't see another reasonable location.

I now think the fdw docs are the wrong place for this since it is a user
requirement, not a developer one.  I put it in CREATE SERVER, patch
attached.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Attachment

Re: FW: Query execution failure

From
Bruce Momjian
Date:
On Thu, Sep 28, 2023 at 09:48:02AM -0400, Bruce Momjian wrote:
> On Wed, Sep 27, 2023 at 10:27:34PM -0400, Bruce Momjian wrote:
> > On Wed, Sep 27, 2023 at 09:06:13PM -0400, Tom Lane wrote:
> > > Bruce Momjian <bruce@momjian.us> writes:
> > > > The attached patch documents this remote sort order requirement.
> > > 
> > > That seems like quite a random place to insert the documentation.
> > > I doubt that the requirement applies only to upper paths -- FDWs
> > > are allowed to return sorted paths for base relations too,
> > > comparable to indexscan paths.
> > 
> > That is the only section of the docs that mentions sort pushdown.
> > I don't see another reasonable location.
> 
> I now think the fdw docs are the wrong place for this since it is a user
> requirement, not a developer one.  I put it in CREATE SERVER, patch
> attached.

Patch applied back to PG 11.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.