Thread: Server crash on RHEL 9/s390x platform against PG16
Machine details:
[edb@9428da9d2137 postgres]$ cat /etc/redhat-release
AlmaLinux release 9.2 (Turquoise Kodkod)
[edb@9428da9d2137 postgres]$ lscpu
Architecture: s390x
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Big Endian
Configure command:
./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd --with-llvm --with-perl --with-python --with-tcl --with-openssl --enable-nls --with-libxml --with-libxslt --with-systemd --with-libcurl --without-icu --enable-debug --enable-cassert --with-pgport=5414
Test case:
CREATE TABLE rm32044_t1
(
pkey integer,
val text
);
CREATE TABLE rm32044_t2
(
pkey integer,
label text,
hidden boolean
);
CREATE TABLE rm32044_t3
(
pkey integer,
val integer
);
CREATE TABLE rm32044_t4
(
pkey integer
);
insert into rm32044_t1 values ( 1 , 'row1');
insert into rm32044_t1 values ( 2 , 'row2');
insert into rm32044_t2 values ( 1 , 'hidden', true);
insert into rm32044_t2 values ( 2 , 'visible', false);
insert into rm32044_t3 values (1 , 1);
insert into rm32044_t3 values (2 , 1);
postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
backtrace:
[edb@9428da9d2137 postgres]$ gdb bin/postgres data/qemu_postgres_20230911-140628_65620.core
Core was generated by `postgres: edb postgres [local] SELECT '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000010a8366 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x1ba3d10, values=values@entry=0x1ba4168, isnull=isnull@entry=0x1ba41a8) at heaptuple.c:227
227 VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
[Current thread is 1 (LWP 65597)]
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-60.el9.s390x libcap-2.48-8.el9.s390x libedit-3.1-37.20210216cvs.el9.s390x libffi-3.4.2-7.el9.s390x libgcc-11.3.1-4.3.el9.alma.s390x libgcrypt-1.10.0-10.el9_2.s390x libgpg-error-1.42-5.el9.s390x libstdc++-11.3.1-4.3.el9.alma.s390x libxml2-2.9.13-3.el9_2.1.s390x libzstd-1.5.1-2.el9.s390x llvm-libs-15.0.7-1.el9.s390x lz4-libs-1.9.3-5.el9.s390x ncurses-libs-6.2-8.20210508.el9.s390x openssl-libs-3.0.7-17.el9_2.s390x systemd-libs-252-14.el9_2.3.s390x xz-libs-5.2.5-8.el9_0.s390x
(gdb) bt
#0 0x00000000010a8366 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x1ba3d10, values=values@entry=0x1ba4168, isnull=isnull@entry=0x1ba41a8) at heaptuple.c:227
#1 0x00000000010a9bb0 in heap_form_minimal_tuple (tupleDescriptor=0x1ba3d10, values=0x1ba4168, isnull=0x1ba41a8) at heaptuple.c:1484
#2 0x00000000016553fa in ExecCopySlotMinimalTuple (slot=<optimized out>) at ../../../../src/include/executor/tuptable.h:472
#3 tuplesort_puttupleslot (state=state@entry=0x1be4d18, slot=slot@entry=0x1ba4120) at tuplesortvariants.c:610
#4 0x00000000012dc0e0 in ExecIncrementalSort (pstate=0x1acb4d8) at nodeIncrementalSort.c:716
#5 0x00000000012b32c6 in ExecProcNode (node=0x1acb4d8) at ../../../src/include/executor/executor.h:273
#6 ExecutePlan (execute_once=<optimized out>, dest=0x1ade698, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1acb4d8, estate=0x1acb258) at execMain.c:1670
#7 standard_ExecutorRun (queryDesc=0x19ad338, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:365
#8 0x00000000014a6ae2 in PortalRunSelect (portal=portal@entry=0x1a63558, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x1ade698) at pquery.c:924
#9 0x00000000014a84e0 in PortalRun (portal=portal@entry=0x1a63558, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x1ade698, altdest=0x1ade698, qc=0x40007ff7b0) at pquery.c:768
#10 0x00000000014a3c1c in exec_simple_query (
query_string=0x19ea0e8 "SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;") at postgres.c:1274
#11 0x00000000014a57aa in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4637
#12 0x00000000013fdaf6 in BackendRun (port=0x1a132c0, port=0x1a132c0) at postmaster.c:4464
#13 BackendStartup (port=0x1a132c0) at postmaster.c:4192
#14 ServerLoop () at postmaster.c:1782
#15 0x00000000013fec34 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x19a59a0) at postmaster.c:1466
#16 0x0000000001096faa in main (argc=<optimized out>, argv=0x19a59a0) at main.c:198
(gdb) p val
$1 = 0
(gdb) p val
$1 = 0
(gdb) p i
$2 = 3
(gdb) f 3
#3 0x0000000001a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at ../../../../src/include/executor/tuptable.h:472
472 return slot->tts_ops->copy_minimal_tuple(slot);
(gdb) p *slot
$3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops = 0x1b6dcc8 <TTSOpsVirtual>, tts_tupleDescriptor = 0x202e0e8, tts_values = 0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid = {ip_blkid = {bi_hi = 65535,
bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0}
(gdb) p *slot->tts_tupleDescriptor
$2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr = 0x0, attrs = 0x202cd28}
(gdb) p slot.tts_values[3]
$4 = 0
(gdb) p slot.tts_values[2]
$5 = 1
(gdb) p slot.tts_values[1]
$6 = 34027556
As per the resultslot, it has 0 value for the third attribute (column lable).
Im testing this on the docker container and facing some issues with gdb hence could not able to debug it further.
Here is a explain plan:
postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Incremental Sort
Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey
Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden
Presorted Key: rm32044_t1.pkey
-> Merge Left Join
Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey
Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey)
-> Sort
Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val
Sort Key: rm32044_t1.pkey
-> Nested Loop
Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val
-> Merge Left Join
Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey
Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey)
-> Sort
Output: rm32044_t3.pkey, rm32044_t3.val
Sort Key: rm32044_t3.pkey
-> Seq Scan on public.rm32044_t3
Output: rm32044_t3.pkey, rm32044_t3.val
-> Sort
Output: rm32044_t4.pkey
Sort Key: rm32044_t4.pkey
-> Seq Scan on public.rm32044_t4
Output: rm32044_t4.pkey
-> Materialize
Output: rm32044_t1.pkey, rm32044_t1.val
-> Seq Scan on public.rm32044_t1
Output: rm32044_t1.pkey, rm32044_t1.val
-> Sort
Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden
Sort Key: rm32044_t2.pkey
-> Seq Scan on public.rm32044_t2
Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden
(34 rows)
It seems like while building the innerslot for merge join, the value for attnum 1 is not getting fetched correctly.
Hi,Found server crash on RHEL 9/s390x platform with below test case -
Machine details:
[edb@9428da9d2137 postgres]$ cat /etc/redhat-release
AlmaLinux release 9.2 (Turquoise Kodkod)
[edb@9428da9d2137 postgres]$ lscpu
Architecture: s390x
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Big Endian
Configure command:
./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd --with-llvm --with-perl --with-python --with-tcl --with-openssl --enable-nls --with-libxml --with-libxslt --with-systemd --with-libcurl --without-icu --enable-debug --enable-cassert --with-pgport=5414
Test case:
CREATE TABLE rm32044_t1
(
pkey integer,
val text
);
CREATE TABLE rm32044_t2
(
pkey integer,
label text,
hidden boolean
);
CREATE TABLE rm32044_t3
(
pkey integer,
val integer
);
CREATE TABLE rm32044_t4
(
pkey integer
);
insert into rm32044_t1 values ( 1 , 'row1');
insert into rm32044_t1 values ( 2 , 'row2');
insert into rm32044_t2 values ( 1 , 'hidden', true);
insert into rm32044_t2 values ( 2 , 'visible', false);
insert into rm32044_t3 values (1 , 1);
insert into rm32044_t3 values (2 , 1);
postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
backtrace:
[edb@9428da9d2137 postgres]$ gdb bin/postgres data/qemu_postgres_20230911-140628_65620.core
Core was generated by `postgres: edb postgres [local] SELECT '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000010a8366 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x1ba3d10, values=values@entry=0x1ba4168, isnull=isnull@entry=0x1ba41a8) at heaptuple.c:227
227 VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
[Current thread is 1 (LWP 65597)]
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-60.el9.s390x libcap-2.48-8.el9.s390x libedit-3.1-37.20210216cvs.el9.s390x libffi-3.4.2-7.el9.s390x libgcc-11.3.1-4.3.el9.alma.s390x libgcrypt-1.10.0-10.el9_2.s390x libgpg-error-1.42-5.el9.s390x libstdc++-11.3.1-4.3.el9.alma.s390x libxml2-2.9.13-3.el9_2.1.s390x libzstd-1.5.1-2.el9.s390x llvm-libs-15.0.7-1.el9.s390x lz4-libs-1.9.3-5.el9.s390x ncurses-libs-6.2-8.20210508.el9.s390x openssl-libs-3.0.7-17.el9_2.s390x systemd-libs-252-14.el9_2.3.s390x xz-libs-5.2.5-8.el9_0.s390x
(gdb) bt
#0 0x00000000010a8366 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x1ba3d10, values=values@entry=0x1ba4168, isnull=isnull@entry=0x1ba41a8) at heaptuple.c:227
#1 0x00000000010a9bb0 in heap_form_minimal_tuple (tupleDescriptor=0x1ba3d10, values=0x1ba4168, isnull=0x1ba41a8) at heaptuple.c:1484
#2 0x00000000016553fa in ExecCopySlotMinimalTuple (slot=<optimized out>) at ../../../../src/include/executor/tuptable.h:472
#3 tuplesort_puttupleslot (state=state@entry=0x1be4d18, slot=slot@entry=0x1ba4120) at tuplesortvariants.c:610
#4 0x00000000012dc0e0 in ExecIncrementalSort (pstate=0x1acb4d8) at nodeIncrementalSort.c:716
#5 0x00000000012b32c6 in ExecProcNode (node=0x1acb4d8) at ../../../src/include/executor/executor.h:273
#6 ExecutePlan (execute_once=<optimized out>, dest=0x1ade698, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1acb4d8, estate=0x1acb258) at execMain.c:1670
#7 standard_ExecutorRun (queryDesc=0x19ad338, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:365
#8 0x00000000014a6ae2 in PortalRunSelect (portal=portal@entry=0x1a63558, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x1ade698) at pquery.c:924
#9 0x00000000014a84e0 in PortalRun (portal=portal@entry=0x1a63558, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x1ade698, altdest=0x1ade698, qc=0x40007ff7b0) at pquery.c:768
#10 0x00000000014a3c1c in exec_simple_query (
query_string=0x19ea0e8 "SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;") at postgres.c:1274
#11 0x00000000014a57aa in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4637
#12 0x00000000013fdaf6 in BackendRun (port=0x1a132c0, port=0x1a132c0) at postmaster.c:4464
#13 BackendStartup (port=0x1a132c0) at postmaster.c:4192
#14 ServerLoop () at postmaster.c:1782
#15 0x00000000013fec34 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x19a59a0) at postmaster.c:1466
#16 0x0000000001096faa in main (argc=<optimized out>, argv=0x19a59a0) at main.c:198
(gdb) p val
$1 = 0```Does anybody have any idea about this?----
postgres=# set jit to off;
SET
postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
pkey | val | pkey | label | hidden | pkey | val | pkey
------+------+------+---------+--------+------+-----+------
1 | row1 | 1 | hidden | t | 1 | 1 |
1 | row1 | 1 | hidden | t | 2 | 1 |
2 | row2 | 2 | visible | f | 1 | 1 |
2 | row2 | 2 | visible | f | 2 | 1 |
(4 rows)
Any idea on this?
Few more details on this:
(gdb) p val
$1 = 0
(gdb) p i
$2 = 3
(gdb) f 3
#3 0x0000000001a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at ../../../../src/include/executor/tuptable.h:472
472 return slot->tts_ops->copy_minimal_tuple(slot);
(gdb) p *slot
$3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops = 0x1b6dcc8 <TTSOpsVirtual>, tts_tupleDescriptor = 0x202e0e8, tts_values = 0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid = {ip_blkid = {bi_hi = 65535,
bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0}
(gdb) p *slot->tts_tupleDescriptor
$2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr = 0x0, attrs = 0x202cd28}
(gdb) p slot.tts_values[3]
$4 = 0
(gdb) p slot.tts_values[2]
$5 = 1
(gdb) p slot.tts_values[1]
$6 = 34027556
As per the resultslot, it has 0 value for the third attribute (column lable).
Im testing this on the docker container and facing some issues with gdb hence could not able to debug it further.
Here is a explain plan:
postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Incremental Sort
Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey
Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden
Presorted Key: rm32044_t1.pkey
-> Merge Left Join
Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey
Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey)
-> Sort
Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val
Sort Key: rm32044_t1.pkey
-> Nested Loop
Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val
-> Merge Left Join
Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey
Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey)
-> Sort
Output: rm32044_t3.pkey, rm32044_t3.val
Sort Key: rm32044_t3.pkey
-> Seq Scan on public.rm32044_t3
Output: rm32044_t3.pkey, rm32044_t3.val
-> Sort
Output: rm32044_t4.pkey
Sort Key: rm32044_t4.pkey
-> Seq Scan on public.rm32044_t4
Output: rm32044_t4.pkey
-> Materialize
Output: rm32044_t1.pkey, rm32044_t1.val
-> Seq Scan on public.rm32044_t1
Output: rm32044_t1.pkey, rm32044_t1.val
-> Sort
Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden
Sort Key: rm32044_t2.pkey
-> Seq Scan on public.rm32044_t2
Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden
(34 rows)
It seems like while building the innerslot for merge join, the value for attnum 1 is not getting fetched correctly.On Tue, Sep 12, 2023 at 3:27 PM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:Hi,Found server crash on RHEL 9/s390x platform with below test case -
Machine details:
[edb@9428da9d2137 postgres]$ cat /etc/redhat-release
AlmaLinux release 9.2 (Turquoise Kodkod)
[edb@9428da9d2137 postgres]$ lscpu
Architecture: s390x
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Big Endian
Configure command:
./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd --with-llvm --with-perl --with-python --with-tcl --with-openssl --enable-nls --with-libxml --with-libxslt --with-systemd --with-libcurl --without-icu --enable-debug --enable-cassert --with-pgport=5414
Test case:
CREATE TABLE rm32044_t1
(
pkey integer,
val text
);
CREATE TABLE rm32044_t2
(
pkey integer,
label text,
hidden boolean
);
CREATE TABLE rm32044_t3
(
pkey integer,
val integer
);
CREATE TABLE rm32044_t4
(
pkey integer
);
insert into rm32044_t1 values ( 1 , 'row1');
insert into rm32044_t1 values ( 2 , 'row2');
insert into rm32044_t2 values ( 1 , 'hidden', true);
insert into rm32044_t2 values ( 2 , 'visible', false);
insert into rm32044_t3 values (1 , 1);
insert into rm32044_t3 values (2 , 1);
postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
backtrace:
[edb@9428da9d2137 postgres]$ gdb bin/postgres data/qemu_postgres_20230911-140628_65620.core
Core was generated by `postgres: edb postgres [local] SELECT '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000010a8366 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x1ba3d10, values=values@entry=0x1ba4168, isnull=isnull@entry=0x1ba41a8) at heaptuple.c:227
227 VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
[Current thread is 1 (LWP 65597)]
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-60.el9.s390x libcap-2.48-8.el9.s390x libedit-3.1-37.20210216cvs.el9.s390x libffi-3.4.2-7.el9.s390x libgcc-11.3.1-4.3.el9.alma.s390x libgcrypt-1.10.0-10.el9_2.s390x libgpg-error-1.42-5.el9.s390x libstdc++-11.3.1-4.3.el9.alma.s390x libxml2-2.9.13-3.el9_2.1.s390x libzstd-1.5.1-2.el9.s390x llvm-libs-15.0.7-1.el9.s390x lz4-libs-1.9.3-5.el9.s390x ncurses-libs-6.2-8.20210508.el9.s390x openssl-libs-3.0.7-17.el9_2.s390x systemd-libs-252-14.el9_2.3.s390x xz-libs-5.2.5-8.el9_0.s390x
(gdb) bt
#0 0x00000000010a8366 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x1ba3d10, values=values@entry=0x1ba4168, isnull=isnull@entry=0x1ba41a8) at heaptuple.c:227
#1 0x00000000010a9bb0 in heap_form_minimal_tuple (tupleDescriptor=0x1ba3d10, values=0x1ba4168, isnull=0x1ba41a8) at heaptuple.c:1484
#2 0x00000000016553fa in ExecCopySlotMinimalTuple (slot=<optimized out>) at ../../../../src/include/executor/tuptable.h:472
#3 tuplesort_puttupleslot (state=state@entry=0x1be4d18, slot=slot@entry=0x1ba4120) at tuplesortvariants.c:610
#4 0x00000000012dc0e0 in ExecIncrementalSort (pstate=0x1acb4d8) at nodeIncrementalSort.c:716
#5 0x00000000012b32c6 in ExecProcNode (node=0x1acb4d8) at ../../../src/include/executor/executor.h:273
#6 ExecutePlan (execute_once=<optimized out>, dest=0x1ade698, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1acb4d8, estate=0x1acb258) at execMain.c:1670
#7 standard_ExecutorRun (queryDesc=0x19ad338, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:365
#8 0x00000000014a6ae2 in PortalRunSelect (portal=portal@entry=0x1a63558, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x1ade698) at pquery.c:924
#9 0x00000000014a84e0 in PortalRun (portal=portal@entry=0x1a63558, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x1ade698, altdest=0x1ade698, qc=0x40007ff7b0) at pquery.c:768
#10 0x00000000014a3c1c in exec_simple_query (
query_string=0x19ea0e8 "SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;") at postgres.c:1274
#11 0x00000000014a57aa in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4637
#12 0x00000000013fdaf6 in BackendRun (port=0x1a132c0, port=0x1a132c0) at postmaster.c:4464
#13 BackendStartup (port=0x1a132c0) at postmaster.c:4192
#14 ServerLoop () at postmaster.c:1782
#15 0x00000000013fec34 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x19a59a0) at postmaster.c:1466
#16 0x0000000001096faa in main (argc=<optimized out>, argv=0x19a59a0) at main.c:198
(gdb) p val
$1 = 0```Does anybody have any idea about this?--------
[edb@9428da9d2137]$ clang --version
clang version 15.0.7 (Red Hat 15.0.7-2.el9)
Target: s390x-ibm-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Let me know if any further information is needed.
It looks like an issue with JIT. If I disable the JIT then the above query runs successfully.postgres=# set jit to off;
SET
postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
pkey | val | pkey | label | hidden | pkey | val | pkey
------+------+------+---------+--------+------+-----+------
1 | row1 | 1 | hidden | t | 1 | 1 |
1 | row1 | 1 | hidden | t | 2 | 1 |
2 | row2 | 2 | visible | f | 1 | 1 |
2 | row2 | 2 | visible | f | 2 | 1 |
(4 rows)
Any idea on this?On Mon, Sep 18, 2023 at 11:20 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:Few more details on this:
(gdb) p val
$1 = 0
(gdb) p i
$2 = 3
(gdb) f 3
#3 0x0000000001a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at ../../../../src/include/executor/tuptable.h:472
472 return slot->tts_ops->copy_minimal_tuple(slot);
(gdb) p *slot
$3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops = 0x1b6dcc8 <TTSOpsVirtual>, tts_tupleDescriptor = 0x202e0e8, tts_values = 0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid = {ip_blkid = {bi_hi = 65535,
bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0}
(gdb) p *slot->tts_tupleDescriptor
$2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr = 0x0, attrs = 0x202cd28}
(gdb) p slot.tts_values[3]
$4 = 0
(gdb) p slot.tts_values[2]
$5 = 1
(gdb) p slot.tts_values[1]
$6 = 34027556
As per the resultslot, it has 0 value for the third attribute (column lable).
Im testing this on the docker container and facing some issues with gdb hence could not able to debug it further.
Here is a explain plan:
postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Incremental Sort
Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey
Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden
Presorted Key: rm32044_t1.pkey
-> Merge Left Join
Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey
Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey)
-> Sort
Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val
Sort Key: rm32044_t1.pkey
-> Nested Loop
Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val
-> Merge Left Join
Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey
Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey)
-> Sort
Output: rm32044_t3.pkey, rm32044_t3.val
Sort Key: rm32044_t3.pkey
-> Seq Scan on public.rm32044_t3
Output: rm32044_t3.pkey, rm32044_t3.val
-> Sort
Output: rm32044_t4.pkey
Sort Key: rm32044_t4.pkey
-> Seq Scan on public.rm32044_t4
Output: rm32044_t4.pkey
-> Materialize
Output: rm32044_t1.pkey, rm32044_t1.val
-> Seq Scan on public.rm32044_t1
Output: rm32044_t1.pkey, rm32044_t1.val
-> Sort
Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden
Sort Key: rm32044_t2.pkey
-> Seq Scan on public.rm32044_t2
Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden
(34 rows)
It seems like while building the innerslot for merge join, the value for attnum 1 is not getting fetched correctly.On Tue, Sep 12, 2023 at 3:27 PM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:Hi,Found server crash on RHEL 9/s390x platform with below test case -
Machine details:
[edb@9428da9d2137 postgres]$ cat /etc/redhat-release
AlmaLinux release 9.2 (Turquoise Kodkod)
[edb@9428da9d2137 postgres]$ lscpu
Architecture: s390x
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Big Endian
Configure command:
./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd --with-llvm --with-perl --with-python --with-tcl --with-openssl --enable-nls --with-libxml --with-libxslt --with-systemd --with-libcurl --without-icu --enable-debug --enable-cassert --with-pgport=5414
Test case:
CREATE TABLE rm32044_t1
(
pkey integer,
val text
);
CREATE TABLE rm32044_t2
(
pkey integer,
label text,
hidden boolean
);
CREATE TABLE rm32044_t3
(
pkey integer,
val integer
);
CREATE TABLE rm32044_t4
(
pkey integer
);
insert into rm32044_t1 values ( 1 , 'row1');
insert into rm32044_t1 values ( 2 , 'row2');
insert into rm32044_t2 values ( 1 , 'hidden', true);
insert into rm32044_t2 values ( 2 , 'visible', false);
insert into rm32044_t3 values (1 , 1);
insert into rm32044_t3 values (2 , 1);
postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
backtrace:
[edb@9428da9d2137 postgres]$ gdb bin/postgres data/qemu_postgres_20230911-140628_65620.core
Core was generated by `postgres: edb postgres [local] SELECT '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000010a8366 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x1ba3d10, values=values@entry=0x1ba4168, isnull=isnull@entry=0x1ba41a8) at heaptuple.c:227
227 VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
[Current thread is 1 (LWP 65597)]
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-60.el9.s390x libcap-2.48-8.el9.s390x libedit-3.1-37.20210216cvs.el9.s390x libffi-3.4.2-7.el9.s390x libgcc-11.3.1-4.3.el9.alma.s390x libgcrypt-1.10.0-10.el9_2.s390x libgpg-error-1.42-5.el9.s390x libstdc++-11.3.1-4.3.el9.alma.s390x libxml2-2.9.13-3.el9_2.1.s390x libzstd-1.5.1-2.el9.s390x llvm-libs-15.0.7-1.el9.s390x lz4-libs-1.9.3-5.el9.s390x ncurses-libs-6.2-8.20210508.el9.s390x openssl-libs-3.0.7-17.el9_2.s390x systemd-libs-252-14.el9_2.3.s390x xz-libs-5.2.5-8.el9_0.s390x
(gdb) bt
#0 0x00000000010a8366 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x1ba3d10, values=values@entry=0x1ba4168, isnull=isnull@entry=0x1ba41a8) at heaptuple.c:227
#1 0x00000000010a9bb0 in heap_form_minimal_tuple (tupleDescriptor=0x1ba3d10, values=0x1ba4168, isnull=0x1ba41a8) at heaptuple.c:1484
#2 0x00000000016553fa in ExecCopySlotMinimalTuple (slot=<optimized out>) at ../../../../src/include/executor/tuptable.h:472
#3 tuplesort_puttupleslot (state=state@entry=0x1be4d18, slot=slot@entry=0x1ba4120) at tuplesortvariants.c:610
#4 0x00000000012dc0e0 in ExecIncrementalSort (pstate=0x1acb4d8) at nodeIncrementalSort.c:716
#5 0x00000000012b32c6 in ExecProcNode (node=0x1acb4d8) at ../../../src/include/executor/executor.h:273
#6 ExecutePlan (execute_once=<optimized out>, dest=0x1ade698, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1acb4d8, estate=0x1acb258) at execMain.c:1670
#7 standard_ExecutorRun (queryDesc=0x19ad338, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:365
#8 0x00000000014a6ae2 in PortalRunSelect (portal=portal@entry=0x1a63558, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x1ade698) at pquery.c:924
#9 0x00000000014a84e0 in PortalRun (portal=portal@entry=0x1a63558, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x1ade698, altdest=0x1ade698, qc=0x40007ff7b0) at pquery.c:768
#10 0x00000000014a3c1c in exec_simple_query (
query_string=0x19ea0e8 "SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;") at postgres.c:1274
#11 0x00000000014a57aa in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4637
#12 0x00000000013fdaf6 in BackendRun (port=0x1a132c0, port=0x1a132c0) at postmaster.c:4464
#13 BackendStartup (port=0x1a132c0) at postmaster.c:4192
#14 ServerLoop () at postmaster.c:1782
#15 0x00000000013fec34 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x19a59a0) at postmaster.c:1466
#16 0x0000000001096faa in main (argc=<optimized out>, argv=0x19a59a0) at main.c:198
(gdb) p val
$1 = 0```Does anybody have any idea about this?------------
It looks like an issue with JIT. If I disable the JIT then the above query runs successfully.postgres=# set jit to off;
SET
postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
pkey | val | pkey | label | hidden | pkey | val | pkey
------+------+------+---------+--------+------+-----+------
1 | row1 | 1 | hidden | t | 1 | 1 |
1 | row1 | 1 | hidden | t | 2 | 1 |
2 | row2 | 2 | visible | f | 1 | 1 |
2 | row2 | 2 | visible | f | 2 | 1 |
(4 rows)
Any idea on this?
Hi, On 2023-09-12 15:27:21 +0530, Suraj Kharage wrote: > *[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release 9.2 > (Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture: > s390x CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits Can you provide the rest of the lscpu output? There have been issues with Z14 vs Z15: https://github.com/llvm/llvm-project/issues/53009 You're apparently not hitting that, but given that fact, you either are on a slightly older CPU, or you have applied a patch to work around it. Because otherwise your uild instructions below would hit that problem, I think. > physical, 48 bits virtual Byte Order: Big Endian* > *Configure command:* > ./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd --with-llvm > --with-perl --with-python --with-tcl --with-openssl --enable-nls > --with-libxml --with-libxslt --with-systemd --with-libcurl --without-icu > --enable-debug --enable-cassert --with-pgport=5414 Hm, based on "--with-libcurl" this isn't upstream postgres, correct? Have you verified the issue reproduces on upstream postgres? > > *Test case:* > CREATE TABLE rm32044_t1 > ( > pkey integer, > val text > ); > CREATE TABLE rm32044_t2 > ( > pkey integer, > label text, > hidden boolean > ); > CREATE TABLE rm32044_t3 > ( > pkey integer, > val integer > ); > CREATE TABLE rm32044_t4 > ( > pkey integer > ); > insert into rm32044_t1 values ( 1 , 'row1'); > insert into rm32044_t1 values ( 2 , 'row2'); > insert into rm32044_t2 values ( 1 , 'hidden', true); > insert into rm32044_t2 values ( 2 , 'visible', false); > insert into rm32044_t3 values (1 , 1); > insert into rm32044_t3 values (2 , 1); > > postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey > = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = > rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > The connection to the server was lost. Attempting reset: Failed. I tried this on both master and 16, without hitting this issue. If you can reproduce the issue on upstream postgres, can you share more about your configuration? Greetings, Andres Freund
Hi,
On 2023-09-12 15:27:21 +0530, Suraj Kharage wrote:
> *[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release 9.2
> (Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture:
> s390x CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits
Can you provide the rest of the lscpu output? There have been issues with Z14
vs Z15:
https://github.com/llvm/llvm-project/issues/53009
You're apparently not hitting that, but given that fact, you either are on a
slightly older CPU, or you have applied a patch to work around it. Because
otherwise your uild instructions below would hit that problem, I think.
> physical, 48 bits virtual Byte Order: Big Endian*
> *Configure command:*
> ./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd --with-llvm
> --with-perl --with-python --with-tcl --with-openssl --enable-nls
> --with-libxml --with-libxslt --with-systemd --with-libcurl --without-icu
> --enable-debug --enable-cassert --with-pgport=5414
Hm, based on "--with-libcurl" this isn't upstream postgres, correct? Have you
verified the issue reproduces on upstream postgres?
Here are details:
./configure --prefix=/home/edb/postgres/ --with-zstd --with-llvm --with-perl --with-python --with-tcl --with-openssl --enable-nls --with-libxml --with-libxslt --with-systemd --without-icu --enable-debug --enable-cassert --with-pgport=5414 CFLAGS="-g -O0"
[edb@9428da9d2137 postgres]$ cat /etc/redhat-release
AlmaLinux release 9.2 (Turquoise Kodkod)
[edb@9428da9d2137 edbas]$ lscpu
Architecture: s390x
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Big Endian
CPU(s): 9
On-line CPU(s) list: 0-8
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
CPU family: 6
Model: 158
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 9
Stepping: 10
BogoMIPS: 5200.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx
16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase bmi1 avx2 bmi2 erms xsaveopt arat
Caches (sum of all):
L1d: 288 KiB (9 instances)
L1i: 288 KiB (9 instances)
L2: 2.3 MiB (9 instances)
L3: 108 MiB (9 instances)
Vulnerabilities:
Itlb multihit: KVM: Mitigation: VMX unsupported
L1tf: Mitigation; PTE Inversion
Mds: Vulnerable; SMT Host state unknown
Meltdown: Vulnerable
Mmio stale data: Vulnerable
Spec store bypass: Vulnerable
Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Spectre v2: Vulnerable, STIBP: disabled
Srbds: Unknown: Dependent on hypervisor status
Tsx async abort: Not affected
[edb@9428da9d2137 postgres]$ clang --version
clang version 15.0.7 (Red Hat 15.0.7-2.el9)
Target: s390x-ibm-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
[edb@9428da9d2137 postgres]$ rpm -qa | grep llvm
llvm-libs-15.0.7-1.el9.s390x
llvm-15.0.7-1.el9.s390x
llvm-test-15.0.7-1.el9.s390x
llvm-static-15.0.7-1.el9.s390x
llvm-devel-15.0.7-1.el9.s390x
>
> *Test case:*
> CREATE TABLE rm32044_t1
> (
> pkey integer,
> val text
> );
> CREATE TABLE rm32044_t2
> (
> pkey integer,
> label text,
> hidden boolean
> );
> CREATE TABLE rm32044_t3
> (
> pkey integer,
> val integer
> );
> CREATE TABLE rm32044_t4
> (
> pkey integer
> );
> insert into rm32044_t1 values ( 1 , 'row1');
> insert into rm32044_t1 values ( 2 , 'row2');
> insert into rm32044_t2 values ( 1 , 'hidden', true);
> insert into rm32044_t2 values ( 2 , 'visible', false);
> insert into rm32044_t3 values (1 , 1);
> insert into rm32044_t3 values (2 , 1);
>
> postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey
> = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey =
> rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> The connection to the server was lost. Attempting reset: Failed.
I tried this on both master and 16, without hitting this issue.
If you can reproduce the issue on upstream postgres, can you share more about
your configuration?
Greetings,
Andres Freund