Thread: Server crash in pg_replication_slot_advance function
Hi, While playing around with newly implemented function 'pg_replication_slot_advance' , found a server crash . Please refer the testcase scenario - centos@centos-cpula bin]$ ./psql postgres psql (11devel) Type "help" for help. postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding', true); slot_name | lsn ------------------+----------- regression_slot1 | 0/1636AE8 (1 row) postgres=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn ------------------+---------------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+--------------------- regression_slot1 | test_decoding | logical | 13220 | postgres | t | t | 25910 | | 557 | 0/1636AB0 | 0/1636AE8 (1 row) postgres=# select pg_switch_wal(); pg_switch_wal --------------- 0/1636B38 (1 row) postgres=# select pg_switch_wal(); pg_switch_wal --------------- 0/2000078 (1 row) postgres=# SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2000078'); end_lsn ----------- 0/2000078 (1 row) postgres=# select * from pg_replication_slots; 2018-02-16 08:09:57.989 GMT [25910] LOG: statement: select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn ------------------+---------------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+--------------------- regression_slot1 | test_decoding | logical | 13220 | postgres | t | t | 25910 | | 557 | 0/2000028 | 0/2000078 (1 row) postgres=# SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2000071'); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. !> Stack trace - Core was generated by `postgres: centos postgres [local] SELECT '. Program terminated with signal 11, Segmentation fault. #0 0x00000000008519a8 in pg_replication_slot_advance (fcinfo=0x7fff5c9c4630) at slotfuncs.c:479 479 ereport(ERROR, Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.192.el6.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-57.el6.x86_64 libcom_err-1.41.12-22.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-48.el6_8.4.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x00000000008519a8 in pg_replication_slot_advance (fcinfo=0x7fff5c9c4630) at slotfuncs.c:479 #1 0x00000000006defdc in ExecMakeTableFunctionResult (setexpr=0x20f7db0, econtext=0x20f7aa8, argContext=0x20fd690, expectedDesc=0x20f92d0, randomAccess=0 '\000') at execSRF.c:231 #2 0x00000000006f1782 in FunctionNext (node=0x20f7990) at nodeFunctionscan.c:95 #3 0x00000000006de80d in ExecScanFetch (node=0x20f7990, accessMtd=0x6f16cb <FunctionNext>, recheckMtd=0x6f1abd <FunctionRecheck>) at execScan.c:95 #4 0x00000000006de8ad in ExecScan (node=0x20f7990, accessMtd=0x6f16cb <FunctionNext>, recheckMtd=0x6f1abd <FunctionRecheck>) at execScan.c:162 #5 0x00000000006f1b0a in ExecFunctionScan (pstate=0x20f7990) at nodeFunctionscan.c:270 #6 0x00000000006dcd66 in ExecProcNodeFirst (node=0x20f7990) at execProcnode.c:446 #7 0x00000000006d3c05 in ExecProcNode (node=0x20f7990) at ../../../src/include/executor/executor.h:246 #8 0x00000000006d6559 in ExecutePlan (estate=0x20f7778, planstate=0x20f7990, use_parallel_mode=0 '\000', operation=CMD_SELECT, sendTuples=1 '\001', numberTuples=0, direction=ForwardScanDirection, dest=0x20fd288, execute_once=1 '\001') at execMain.c:1721 #9 0x00000000006d41d7 in standard_ExecutorRun (queryDesc=0x2065878, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at execMain.c:361 #10 0x00000000006d3ff3 in ExecutorRun (queryDesc=0x2065878, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at execMain.c:304 #11 0x00000000008afeeb in PortalRunSelect (portal=0x20a5838, forward=1 '\001', count=0, dest=0x20fd288) at pquery.c:932 #12 0x00000000008afb79 in PortalRun (portal=0x20a5838, count=9223372036854775807, isTopLevel=1 '\001', run_once=1 '\001', dest=0x20fd288, altdest=0x20fd288, completionTag=0x7fff5c9c4f00 "") at pquery.c:773 #13 0x00000000008a9ba0 in exec_simple_query (query_string=0x2040438 "SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2000071');") at postgres.c:1120 #14 0x00000000008ade34 in PostgresMain (argc=1, argv=0x206bec0, dbname=0x206bd20 "postgres", username=0x203cf58 "centos") at postgres.c:4144 #15 0x000000000080d11a in BackendRun (port=0x2063d00) at postmaster.c:4412 #16 0x000000000080c88e in BackendStartup (port=0x2063d00) at postmaster.c:4084 #17 0x0000000000808cbb in ServerLoop () at postmaster.c:1757 #18 0x0000000000808379 in PostmasterMain (argc=3, argv=0x203aeb0) at postmaster.c:1365 #19 0x000000000073d3d0 in main (argc=3, argv=0x203aeb0) at main.c:228 (gdb) -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company
On Fri, Feb 16, 2018 at 1:44 PM, tushar <tushar.ahuja@enterprisedb.com> wrote: > Hi, [....] > postgres=# SELECT end_lsn FROM > pg_replication_slot_advance('regression_slot1', '0/2000071'); > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > !> > I am able to reproduce this on the latest master head, the problem is in the following hunk of pg_replication_slot_advance() where oldest LSN value is accessed after releasing replication slot: 476 if (moveto < startlsn) 477 { 478 ReplicationSlotRelease(); 479 ereport(ERROR, 480 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), 481 errmsg("cannot move slot to %X/%X, minimum is %X/%X", 482 (uint32) (moveto >> 32), (uint32) moveto, 483 (uint32) (MyReplicationSlot->data.confirmed_flush >> 32), 484 (uint32) (MyReplicationSlot->data.confirmed_flush)))); 485 } 486 Regards, Amul
On Fri, Feb 16, 2018 at 3:06 PM, amul sul <sulamul@gmail.com> wrote: > On Fri, Feb 16, 2018 at 1:44 PM, tushar <tushar.ahuja@enterprisedb.com> wrote: >> Hi, > [....] >> postgres=# SELECT end_lsn FROM >> pg_replication_slot_advance('regression_slot1', '0/2000071'); >> server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> !> >> > > I am able to reproduce this on the latest master head, the problem is in the > following hunk of pg_replication_slot_advance() where oldest LSN value > is accessed after releasing replication slot: > > > 476 if (moveto < startlsn) > 477 { > 478 ReplicationSlotRelease(); > 479 ereport(ERROR, > 480 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > 481 errmsg("cannot move slot to %X/%X, minimum is %X/%X", > 482 (uint32) (moveto >> 32), (uint32) moveto, > 483 (uint32) > (MyReplicationSlot->data.confirmed_flush >> 32), > 484 (uint32) > (MyReplicationSlot->data.confirmed_flush)))); > 485 } > 486 > Attached patch proposes a required fix. Regards, Amul
Attachment
On 02/16/2018 03:25 PM, amul sul wrote: > Attached patch proposes a required fix. Thanks, Issue seems to be fixed with this patch , now getting an expected error -ERROR: cannot move slot to 0/2000090, minimum is 0/2000098 -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company
On 16/02/18 10:55, amul sul wrote: > On Fri, Feb 16, 2018 at 3:06 PM, amul sul <sulamul@gmail.com> wrote: >> On Fri, Feb 16, 2018 at 1:44 PM, tushar <tushar.ahuja@enterprisedb.com> wrote: >>> Hi, >> [....] >>> postgres=# SELECT end_lsn FROM >>> pg_replication_slot_advance('regression_slot1', '0/2000071'); >>> server closed the connection unexpectedly >>> This probably means the server terminated abnormally >>> before or while processing the request. >>> !> >>> >> >> I am able to reproduce this on the latest master head, the problem is in the >> following hunk of pg_replication_slot_advance() where oldest LSN value >> is accessed after releasing replication slot: >> >> >> 476 if (moveto < startlsn) >> 477 { >> 478 ReplicationSlotRelease(); >> 479 ereport(ERROR, >> 480 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> 481 errmsg("cannot move slot to %X/%X, minimum is %X/%X", >> 482 (uint32) (moveto >> 32), (uint32) moveto, >> 483 (uint32) >> (MyReplicationSlot->data.confirmed_flush >> 32), >> 484 (uint32) >> (MyReplicationSlot->data.confirmed_flush)))); >> 485 } >> 486 >> Yeah, of course we can't use MyReplicationSlot after calling ReplicationSlotRelease(). > > Attached patch proposes a required fix. > Looks correct to me. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Feb 16, 2018 at 6:55 PM, amul sul <sulamul@gmail.com> wrote: > On Fri, Feb 16, 2018 at 3:06 PM, amul sul <sulamul@gmail.com> wrote: >> On Fri, Feb 16, 2018 at 1:44 PM, tushar <tushar.ahuja@enterprisedb.com> wrote: >>> Hi, >> [....] >>> postgres=# SELECT end_lsn FROM >>> pg_replication_slot_advance('regression_slot1', '0/2000071'); >>> server closed the connection unexpectedly >>> This probably means the server terminated abnormally >>> before or while processing the request. >>> !> >>> >> >> I am able to reproduce this on the latest master head, the problem is in the >> following hunk of pg_replication_slot_advance() where oldest LSN value >> is accessed after releasing replication slot: >> >> >> 476 if (moveto < startlsn) >> 477 { >> 478 ReplicationSlotRelease(); >> 479 ereport(ERROR, >> 480 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> 481 errmsg("cannot move slot to %X/%X, minimum is %X/%X", >> 482 (uint32) (moveto >> 32), (uint32) moveto, >> 483 (uint32) >> (MyReplicationSlot->data.confirmed_flush >> 32), >> 484 (uint32) >> (MyReplicationSlot->data.confirmed_flush)))); >> 485 } >> 486 >> > > Attached patch proposes a required fix. > The change looks good to me. Thank you. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Petr Jelinek wrote: > > Attached patch proposes a required fix. > > > > Looks correct to me. Masahiko Sawada wrote: > The change looks good to me. Thank you. Thanks, pushed. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services