BUG #17512: Process running query fails with SIGSEV - nodeMemoize.c:349 - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #17512: Process running query fails with SIGSEV - nodeMemoize.c:349 |
Date | |
Msg-id | 17512-36fa5fab90f48af9@postgresql.org Whole thread Raw |
Responses |
Re: BUG #17512: Process running query fails with SIGSEV - nodeMemoize.c:349
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 17512 Logged by: Aleš Zelený Email address: zeleny.ales@gmail.com PostgreSQL version: 14.2 Operating system: CentOS 7 Description: Version information: OS: CentOS Linux release 7.9.2009 (Core) PostgresSQL (RPM installation): PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit PostgreSQL Log: 2022-06-06 21:23:21.147 UTC [1647] 629e7049.66f, 1, 0, 0, , [local], [unknown], [unknown], [unknown], 00000, LOG: connection received: host=[local] 2022-06-06 21:23:21.148 UTC [1647] 629e7049.66f, 2, 0, 0, 5/7, [local], sdp_prod, zelenya, [unknown], 00000, authentication LOG: connection authenticated: identity="zelenya" method=peer (/pgsql/sdp/cluster/14 /pg_hba.conf:85) 2022-06-06 21:23:21.148 UTC [1647] 629e7049.66f, 3, 0, 0, 5/7, [local], sdp_prod, zelenya, [unknown], 00000, authentication LOG: connection authorized: user=zelenya database=sdp_prod application_name=psql 2022-06-06 21:24:00.695 UTC [1124] 629e7004.464, 22, 0, 0, , , , , , 00000, LOG: server process (PID 1647) was terminated by signal 11: Segmentation fault 2022-06-06 21:24:00.695 UTC [1124] 629e7004.464, 23, 0, 0, , , , , , 00000, DETAIL: Failed process was running: explain (analyze, buffers) 2022-06-06 21:24:00.695 UTC [1124] 629e7004.464, 24, 0, 0, , , , , , 00000, LOG: terminating any other active server processes 2022-06-06 21:24:00.806 UTC [1124] 629e7004.464, 25, 0, 0, , , , , , 00000, LOG: all server processes terminated; reinitializing 2022-06-06 21:24:01.912 UTC [1655] 629e7071.677, 1, 0, 0, , , , , , 00000, LOG: database system was interrupted; last known up at 2022-06-06 21:22:12 UTC 2022-06-06 21:24:02.003 UTC [1655] 629e7071.677, 2, 0, 0, , , , , , 42501, LOG: could not open directory "pg_tblspc/140428/lost+found": Permission denied 2022-06-06 21:24:02.021 UTC [1655] 629e7071.677, 3, 0, 0, , , , , , 00000, LOG: database system was not properly shut down; automatic recovery in progress 2022-06-06 21:24:02.028 UTC [1655] 629e7071.677, 4, 0, 0, , , , , , 00000, LOG: redo starts at 121C/190000A0 2022-06-06 21:24:02.028 UTC [1655] 629e7071.677, 5, 0, 0, , , , , , 00000, LOG: invalid record length at 121C/190000D8: wanted 24, got 0 2022-06-06 21:24:02.028 UTC [1655] 629e7071.677, 6, 0, 0, , , , , , 00000, LOG: redo done at 121C/190000A0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2022-06-06 21:24:02.039 UTC [1655] 629e7071.677, 7, 0, 0, , , , , , 00000, LOG: checkpoint starting: end-of-recovery immediate 2022-06-06 21:24:02.208 UTC [1655] 629e7071.677, 8, 0, 0, , , , , , 00000, LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.153 s, sync=0.001 s, total=0.1 72 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB 2022-06-06 21:24:02.227 UTC [1124] 629e7004.464, 26, 0, 0, , , , , , 00000, LOG: database system is ready to accept connections The process crash with SIGSEV is reproducible on production, core dump was enabled to get more details: -bash-4.2$ gdb /usr/pgsql-14/bin/postgres core.postmaster.19645.ip-172-31-1-64.us-east-2.compute.internal.1654522585 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/pgsql-14/bin/postgres...Reading symbols from /usr/lib/debug/usr/pgsql-14/bin/postgres.debug...done. done. warning: core file may not match specified executable file. [New LWP 19645] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `postgres: sdp01: zelenya sdp_prod [local] EXPLAIN '. Program terminated with signal 11, Segmentation fault. #0 remove_cache_entry (entry=<optimized out>, mstate=<optimized out>) at nodeMemoize.c:349 349 nodeMemoize.c: No such file or directory. Missing separate debuginfos, use: debuginfo-install audit-libs-2.8.5-4.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-lib-2.1.26-24.el7_9.x86_64 elfutils-libelf-0.176-5.el7.x86_64 elfutils-libs-0.176-5.el7.x86_64 glibc-2.17-325.el7_9.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-51.el7_9.x86_64 libattr-2.4.46-13.el7.x86_64 libcap-2.22-11.el7.x86_64 libcap-ng-0.7.5-4.el7.x86_64 libcom_err-1.42.9-19.el7.x86_64 libedit-3.0-12.20121213cvs.el7.x86_64 libffi-3.0.13-19.el7.x86_64 libgcc-4.8.5-44.el7.x86_64 libgcrypt-1.5.3-14.el7.x86_64 libgpg-error-1.12-3.el7.x86_64 libicu-50.2-4.el7_7.x86_64 libselinux-2.5-15.el7.x86_64 libstdc++-4.8.5-44.el7.x86_64 libxml2-2.9.1-6.el7_9.6.x86_64 llvm5.0-libs-5.0.1-7.el7.x86_64 lz4-1.8.3-1.el7.x86_64 ncurses-libs-5.9-14.20130511.el7_4.x86_64 nspr-4.32.0-1.el7_9.x86_64 nss-3.67.0-4.el7_9.x86_64 nss-softokn-freebl-3.67.0-3.el7_9.x86_64 nss-util-3.67.0-1.el7_9.x86_64 openldap-2.4.44-25.el7_9.x86_64 openssl-libs-1.0.2k-24.el7_9.x86_64 pam-1.1.8-23.el7.x86_64 pcre-8.32-17.el7.x86_64 pg_qualstats_14-2.0.4-1.rhel7.x86_64 pg_stat_kcache_14-2.2.0-2.rhel7.x86_64 pg_wait_sampling_14-1.1.3-1.rhel7.x86_64 powa_14-4.1.4-1.rhel7.x86_64 systemd-libs-219-78.el7_9.5.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-19.el7_9.x86_64 (gdb) bt #0 remove_cache_entry (entry=<optimized out>, mstate=<optimized out>) at nodeMemoize.c:349 #1 cache_reduce_memory (mstate=mstate@entry=0x2d0a2a0, specialkey=specialkey@entry=0x122614e8) at nodeMemoize.c:468 #2 0x0000000000669cce in cache_store_tuple (mstate=mstate@entry=0x2d0a2a0, slot=slot@entry=0x2d0a7e0) at nodeMemoize.c:635 #3 0x000000000066a0ab in ExecMemoize (pstate=0x2d0a2a0) at nodeMemoize.c:850 #4 0x0000000000649b28 in ExecProcNodeInstr (node=0x2d0a2a0) at execProcnode.c:479 #5 0x00000000006710f5 in ExecProcNode (node=0x2d0a2a0) at ../../../src/include/executor/executor.h:257 #6 ExecNestLoop (pstate=0x2c70a38) at nodeNestloop.c:160 #7 0x0000000000649b28 in ExecProcNodeInstr (node=0x2c70a38) at execProcnode.c:479 #8 0x00000000006735a6 in ExecProcNode (node=0x2c70a38) at ../../../src/include/executor/executor.h:257 #9 ExecSort (pstate=0x2c70828) at nodeSort.c:108 #10 0x0000000000649b28 in ExecProcNodeInstr (node=0x2c70828) at execProcnode.c:479 #11 0x0000000000653b19 in ExecProcNode (node=0x2c70828) at ../../../src/include/executor/executor.h:257 #12 fetch_input_tuple (aggstate=aggstate@entry=0x2c701c0) at nodeAgg.c:581 #13 0x00000000006568f2 in agg_retrieve_direct (aggstate=0x2c701c0) at nodeAgg.c:2364 #14 ExecAgg (pstate=0x2c701c0) at nodeAgg.c:2179 #15 0x0000000000649b28 in ExecProcNodeInstr (node=0x2c701c0) at execProcnode.c:479 #16 0x0000000000670fe9 in ExecProcNode (node=0x2c701c0) at ../../../src/include/executor/executor.h:257 #17 ExecNestLoop (pstate=0x2c70020) at nodeNestloop.c:109 #18 0x0000000000649b28 in ExecProcNodeInstr (node=0x2c70020) at execProcnode.c:479 #19 0x00000000006735a6 in ExecProcNode (node=0x2c70020) at ../../../src/include/executor/executor.h:257 #20 ExecSort (pstate=0x2c6fe10) at nodeSort.c:108 #21 0x0000000000649b28 in ExecProcNodeInstr (node=0x2c6fe10) at execProcnode.c:479 #22 0x0000000000653b19 in ExecProcNode (node=0x2c6fe10) at ../../../src/include/executor/executor.h:257 #23 fetch_input_tuple (aggstate=aggstate@entry=0x2c6f838) at nodeAgg.c:581 #24 0x00000000006568f2 in agg_retrieve_direct (aggstate=0x2c6f838) at nodeAgg.c:2364 #25 ExecAgg (pstate=0x2c6f838) at nodeAgg.c:2179 #26 0x0000000000649b28 in ExecProcNodeInstr (node=0x2c6f838) at execProcnode.c:479 #27 0x00000000006683b3 in ExecProcNode (node=0x2c6f838) at ../../../src/include/executor/executor.h:257 #28 ExecLimit (pstate=0x2c6f548) at nodeLimit.c:96 #29 0x0000000000649b28 in ExecProcNodeInstr (node=0x2c6f548) at execProcnode.c:479 #30 0x0000000000643152 in ExecProcNode (node=0x2c6f548) at ../../../src/include/executor/executor.h:257 #31 ExecutePlan (execute_once=<optimized out>, dest=0xa40d00 <donothingDR>, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x2c6f548, estate=0x2c6f018) at execMain.c:1551 #32 standard_ExecutorRun (queryDesc=0x2ea0188, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:361 #33 0x00007f06ba3e461d in pgss_ExecutorRun (queryDesc=0x2ea0188, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:1003 #34 0x00007f06b9fd44c1 in pgsk_ExecutorRun () from /usr/pgsql-14/lib/pg_stat_kcache.so #35 0x00007f06b9dcbfd1 in pgqs_ExecutorRun () from /usr/pgsql-14/lib/pg_qualstats.so #36 0x00000000005d8842 in ExplainOnePlan (plannedstmt=plannedstmt@entry=0x2e9cbe8, into=into@entry=0x0, es=es@entry=0x342d228, queryString=queryString@entry=0x2c72f58 "explain (analyze, buffers)\n", ' ' <repeats 24 times>, "WITH daily_tech AS (\n", ' ' <repeats 28 times>, "SELECT\n", ' ' <repeats 33 times>, "\n", ' ' <repeats 32 times>, "MAX(dmd_current0.split_adj_"..., params=params@entry=0x0, queryEnv=<optimized out>, planduration=planduration@entry=0x7ffc3d75ca60, bufusage=0x7ffc3d75cae0) at explain.c:593 #37 0x00000000005d8bdd in ExplainOneQuery (query=<optimized out>, cursorOptions=<optimized out>, into=0x0, es=0x342d228, queryString=0x2c72f58 "explain (analyze, buffers)\n", ' ' <repeats 24 times>, "WITH daily_tech AS (\n", ' ' <repeats 28 times>, "SELECT\n", ' ' <repeats 33 times>, "\n", ' ' <repeats 32 times>, "MAX(dmd_current0.split_adj_"..., params=0x0, queryEnv=0x0) at explain.c:410 #38 0x00000000005d9285 in ExplainQuery (pstate=pstate@entry=0x2c77230, stmt=stmt@entry=0x319b468, params=params@entry=0x0, dest=dest@entry=0x2c771a0) at explain.c:281 #39 0x00000000007bc575 in standard_ProcessUtility (pstmt=pstmt@entry=0x2e296b0, queryString=queryString@entry=0x2c72f58 "explain (analyze, buffers)\n", ' ' <repeats 24 times>, "WITH daily_tech AS (\n", ' ' <repeats 28 times>, "SELECT\n", ' ' <repeats 33 times>, "\n", ' ' <repeats 32 times>, "MAX(dmd_current0.split_adj_"..., readOnlyTree=<optimized out>, context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=dest@entry=0x2c771a0, qc=qc@entry=0x7ffc3d75cfb0) at utility.c:862 #40 0x00007f06ba3e60a0 in pgss_ProcessUtility (pstmt=0x2e296b0, queryString=0x2c72f58 "explain (analyze, buffers)\n", ' ' <repeats 24 times>, "WITH daily_tech AS (\n", ' ' <repeats 28 times>, "SELECT\n", ' ' <repeats 33 times>, "\n", ' ' <repeats 32 times>, "MAX(dmd_current0.split_adj_"..., readOnlyTree=<optimized out>, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x2c771a0, qc=0x7ffc3d75cfb0) at pg_stat_statements.c:1135 #41 0x00000000007ba7aa in PortalRunUtility (portal=portal@entry=0x2bef9d8, pstmt=0x2e296b0, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=true, dest=dest@entry=0x2c771a0, qc=qc@entry=0x7ffc3d75cfb0) at pquery.c:1155 #42 0x00000000007bab11 in FillPortalStore (portal=portal@entry=0x2bef9d8, isTopLevel=isTopLevel@entry=true) at pquery.c:1028 #43 0x00000000007bae8b in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x2e29790, altdest=0x2e29790, qc=0x7ffc3d75d190) at pquery.c:760 #44 0x00000000007b6be7 in exec_simple_query ( query_string=0x2c72f58 "explain (analyze, buffers)\n", ' ' <repeats 24 times>, "WITH daily_tech AS (\n", ' ' <repeats 28 times>, "SELECT\n", ' ' <repeats 33 times>, "\n", ' ' <repeats 32 times>, "MAX(dmd_current0.split_adj_"...) at postgres.c:1214 #45 0x00000000007b8063 in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7ffc3d75d560, dbname=0x2bb78e0 "sdp_prod", username=<optimized out>) at postgres.c:4486 #46 0x000000000048d7ba in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4530 #47 BackendStartup (port=0x2ba6060) at postmaster.c:4252 #48 ServerLoop () at postmaster.c:1745 #49 0x0000000000735692 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x2b75280) at postmaster.c:1417 #50 0x000000000048e8d2 in main (argc=3, argv=0x2b75280) at main.c:209 A small isolated test case is not available so far, the issue occurs on select involving 450GiB table. Currently, production DB is being restored to a separated machine to allow future testing/investigation without affecting other database users. Thanks for any advice on working around this issue or how to collect better diagnostic data / additional helpful information. Kind regards Ales Zeleny
pgsql-bugs by date: