Thread: BUG #16764: wal sender exits abnormally which cause wal receiever exits twice
BUG #16764: wal sender exits abnormally which cause wal receiever exits twice
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16764 Logged by: Bo Chen Email address: bchen90@163.com PostgreSQL version: 11.8 Operating system: Suse linux 11.3 x86_64 Description: Hi,hackers For hot standby mode, I constructed the following operations,kills the walsender process while inserting some data to the test table. I found the walreceiver process exits twice abnormally , obviously first exiting is because the exit of the sender,but when the wal replication stream reestablished,wal walreceiver killed by startup because invaild xlog record. I analysis the problem, it is found that the xlog record of the problem location(across page record) changes before and after the sender exits, and the startup process of the standby has already loaded the previous page of the changing record into memory before the exit of the receiver. As a result, invalid xlog record is founded and the receiver process killed by strartup. ps: I test 11.5 and the problem is still exists, I believe this problem has been around for a long time. Is tihis problem normally? I'm worried about this problem could cause standby data damage in extreme scenarios(for the same contrecord length and crc, but not the same xlog record) insert data procedure: create or replace procedure test_procedure(in_count in integer) as $$ declare i_count integer; begin FOR i_count IN 1..in_count LOOP INSERT INTO test_table VALUES(i_count,'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'); end loop; delete from test_table; end; $$ language plpgsql; standby database runing log: 2020-11-27 12:14:13.288 CGP [38116] DEBUG: sendtime 2020-11-27 12:14:13.287768+00 receipttime 2020-11-27 12:14:13.288081+00 replication apply delay 26585 ms transfer latency 0 ms 2020-11-27 12:14:13.288 CGP [38116] DEBUG: sending write 1/B1FB8E0 flush 1/B1F4FA0 apply 1/B1F4EF8 2020-11-27 12:14:13.288 CGP [38116] DEBUG: sending write 1/B1FB8E0 flush 1/B1FB8E0 apply 1/B1F4EF8 2020-11-27 12:14:13.289 CGP [7804] LOG: invalid contrecord length 152 at 1/B1F4EF8 2020-11-27 12:14:13.289 CGP [38116] FATAL: terminating walreceiver process due to administrator command 2020-11-27 12:14:13.289 CGP [38116] DEBUG: shmem_exit(1): 1 before_shmem_exit callbacks to make 2020-11-27 12:14:13.289 CGP [38116] DEBUG: shmem_exit(1): 5 on_shmem_exit callbacks to make 2020-11-27 12:14:13.289 CGP [38116] DEBUG: proc_exit(1): 2 callbacks to make 2020-11-27 12:14:13.289 CGP [38116] DEBUG: exit(1) 2020-11-27 12:14:13.289 CGP [38116] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make 2020-11-27 12:14:13.289 CGP [38116] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make 2020-11-27 12:14:13.289 CGP [38116] DEBUG: proc_exit(-1): 0 callbacks to make 2020-11-27 12:14:13.289 CGP [7804] DEBUG: switched WAL source from stream to archive after failure 2020-11-27 12:14:13.289 CGP [7804] DEBUG: transaction ID wrap limit is 2147484208, limited by database with OID 1 2020-11-27 12:14:13.289 CGP [7804] CONTEXT: WAL redo at 1/B1F4EF8 for XLOG/CHECKPOINT_SHUTDOWN: redo 1/B1F4EF8; tli 1; prev tli 1; fpw true; xid 0:99176; oid 24585; multi 1; offset 0; oldest xid 561 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown