BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend. - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend.
Date
Msg-id 16754-4a800d382f5cfae9@postgresql.org
Whole thread Raw
Responses Re: BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend.
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      16754
Logged by:          Noriyoshi Shinoda
Email address:      noriyoshi.shinoda@hpe.com
PostgreSQL version: 13.1
Operating system:   Red Hat Enterprise Linux 7.8 (x86-64)
Description:

OS: 
Red Hat Enterprise Linux 7.8 (x86-64)

Version:
PostgreSQL 13.1
cfe-7.1.0
llvm-7.1.0
cmake-3.18.4

Configure:
 configure --prefix=$HOME/pg131 --with-llvm --enable-debug
--enable-cassert

$ pg_config
BINDIR = /home/postgres/pg131/bin
DOCDIR = /home/postgres/pg131/share/doc
HTMLDIR = /home/postgres/pg131/share/doc
INCLUDEDIR = /home/postgres/pg131/include
PKGINCLUDEDIR = /home/postgres/pg131/include
INCLUDEDIR-SERVER = /home/postgres/pg131/include/server
LIBDIR = /home/postgres/pg131/lib
PKGLIBDIR = /home/postgres/pg131/lib
LOCALEDIR = /home/postgres/pg131/share/locale
MANDIR = /home/postgres/pg131/share/man
SHAREDIR = /home/postgres/pg131/share
SYSCONFDIR = /home/postgres/pg131/etc
PGXS = /home/postgres/pg131/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE =  '--prefix=/home/postgres/pg131' '--with-llvm' '--enable-debug'
'--enable-cassert'
CC = gcc -std=gnu99
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
-fexcess-precision=standard -g -O2
CFLAGS_SL = -fPIC
LDFLAGS = -L/usr/local/lib -Wl,--as-needed
-Wl,-rpath,'/home/postgres/pg131/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lpthread -lz -lreadline -lrt -ldl -lm
VERSION = PostgreSQL 13.1

- Prepare
$ initdb --no-locale --encoding=utf8 data.131
$ pg_ctl -D data.131 start
$ psql
postgres=# SELECT version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 13.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)
postgres=# CREATE USER demo PASSWORD 'demo';
CREATE ROLE
postgres=# \connect postgres demo
You are now connected to database "postgres" as user "demo".
postgres=> CREATE TABLE data1(c1 NUMERIC, c2 VARCHAR(10));
CREATE TABLE
postgres=> INSERT INTO data1 VALUES (generate_series(1, 1000000),
'data1');
INSERT 0 1000000
postgres=> ANALYZE data1;
ANALYZE
postgres=> \d+
                         List of relations
 Schema | Name  | Type  | Owner | Persistence | Size  | Description
--------+-------+-------+-------+-------------+-------+-------------
 public | data1 | table | demo  | permanent   | 42 MB |
(1 row)

- Operation
-- Normal case (No JIT/Cancel parallel worker)
--- Session#1
$ psql postgres demo
postgres=> SET force_parallel_mode = on;
SET
postgres=> SET jit = off;
SET
postgres=> EXPLAIN SELECT * FROM data1 d1, data1 d2;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Gather  (cost=1000.00..117383034312.00 rows=1000000000000 width=24)
   Workers Planned: 1
   Single Copy: true
   ->  Nested Loop  (cost=0.00..17383033312.00 rows=1000000000000
width=24)
         ->  Seq Scan on data1 d1  (cost=0.00..15406.00 rows=1000000
width=12)
         ->  Materialize  (cost=0.00..25289.00 rows=1000000 width=12)
               ->  Seq Scan on data1 d2  (cost=0.00..15406.00 rows=1000000
width=12)
(7 rows)
    
postgres=> EXPLAIN ANALYZE SELECT * FROM data1 d1, data1 d2;        <- Execute
pg_cancel_backend(130457) on Session#2
ERROR:  canceling statement due to user request
CONTEXT:  parallel worker

--- Session#2
$ psql
postgres=# SELECT pid, leader_pid FROM pg_stat_activity WHERE
usename='demo';
  pid   | leader_pid
--------+------------
 130413 |
 130457 |     130413
(2 rows)

postgres=# SELECT pg_cancel_backend(130457);
 pg_cancel_backend
-------------------
 t
(1 row)


-- NG case (With JIT / Cancel parallel worker)
--- Session#1

$ psql postgres demo    
postgres=> SET force_parallel_mode = on;
SET
postgres=> SET jit = on;
SET
postgres=> EXPLAIN SELECT * FROM data1 d1, data1 d2;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Gather  (cost=1000.00..117383034312.00 rows=1000000000000 width=24)
   Workers Planned: 1
   Single Copy: true
   ->  Nested Loop  (cost=0.00..17383033312.00 rows=1000000000000
width=24)
         ->  Seq Scan on data1 d1  (cost=0.00..15406.00 rows=1000000
width=12)
         ->  Materialize  (cost=0.00..25289.00 rows=1000000 width=12)
               ->  Seq Scan on data1 d2  (cost=0.00..15406.00 rows=1000000
width=12)
 JIT:
   Functions: 3
   Options: Inlining true, Optimization true, Expressions true, Deforming
true
(10 rows)

postgres=> EXPLAIN ANALYZE SELECT * FROM data1 d1, data1 d2;    <- Execute
pg_cancel_backend(130342);
WARNING:  terminating connection because of crash of another server
process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
ERROR:  canceling statement due to user request
CONTEXT:  parallel worker
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?>

--- Session#2
postgres=# SELECT pid, leader_pid FROM pg_stat_activity WHERE
usename='demo';
  pid   | leader_pid
--------+------------
 130322 |
 130342 |     130322
(2 rows)

postgres=# SELECT pg_cancel_backend(130342);
 pg_cancel_backend
-------------------
 t
(1 row)

- log file
2020-12-01 14:26:02.337 JST [130270] LOG:  background worker "parallel
worker" (PID 130342) was terminated by signal 11: Segmentation fault
2020-12-01 14:26:02.337 JST [130270] DETAIL:  Failed process was running:
EXPLAIN ANALYZE SELECT * FROM data1 d1, data1 d2;
2020-12-01 14:26:02.337 JST [130270] LOG:  terminating any other active
server processes
2020-12-01 14:26:02.337 JST [130322] WARNING:  terminating connection
because of crash of another server process
2020-12-01 14:26:02.337 JST [130322] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-12-01 14:26:02.337 JST [130322] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2020-12-01 14:26:02.338 JST [130276] WARNING:  terminating connection
because of crash of another server process
2020-12-01 14:26:02.338 JST [130276] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-12-01 14:26:02.338 JST [130276] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2020-12-01 14:26:02.340 JST [130345] FATAL:  the database system is in
recovery mode
2020-12-01 14:26:02.343 JST [130319] WARNING:  terminating connection
because of crash of another server process
2020-12-01 14:26:02.343 JST [130319] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-12-01 14:26:02.343 JST [130319] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2020-12-01 14:26:02.343 JST [130270] LOG:  all server processes terminated;
reinitializing
2020-12-01 14:26:02.355 JST [130348] LOG:  database system was interrupted;
last known up at 2020-12-01 14:21:07 JST
2020-12-01 14:26:02.360 JST [130348] LOG:  database system was not properly
shut down; automatic recovery in progress
2020-12-01 14:26:02.360 JST [130348] LOG:  redo starts at 0/15C07D0
2020-12-01 14:26:02.767 JST [130348] LOG:  invalid record length at
0/5B10A90: wanted 24, got 0
2020-12-01 14:26:02.767 JST [130348] LOG:  redo done at 0/5B10A58
2020-12-01 14:26:02.821 JST [130270] LOG:  database system is ready to
accept connections


pgsql-bugs by date:

Previous
From: "qiuchenjun@highgo.com"
Date:
Subject: 回复: Re: A documents mistaken of PG12.5
Next
From: PG Bug reporting form
Date:
Subject: BUG #16755: A specification or a bug? Digit drop on CAST from DOUBLE PRECISION to NUMERIC.