RE: BUG #16922: In cascading replication, a standby server aborted when an upstream standby server promoted - Mailing list pgsql-bugs

From egashira.yusuke@fujitsu.com
Subject RE: BUG #16922: In cascading replication, a standby server aborted when an upstream standby server promoted
Date
Msg-id OSBPR01MB502909B028B39670554CD9A2FF629@OSBPR01MB5029.jpnprd01.prod.outlook.com
Whole thread Raw
Responses Re: BUG #16922: In cascading replication, a standby server aborted when an upstream standby server promoted  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-bugs
> The replication between "NODE-A" and "NODE-B" is synchronous replication,
> and between "NODE-B" and "NODE-C" is asynchronous.
> 
> "NODE-A" <-[synchronous]-> "NODE-B" <-[non-synchronous]-> "NODE-C"
> 
> When the primary server "NODE-A" crashed due to full WAL storage and
> "NODE-B" promoted, the downstream standby server "NODE-C" aborted with
> following messages.
> 
> 2021-03-11 11:26:28.470 JST [85228] LOG:  invalid contrecord length 26 at
> 0/5FFFFF0
> 2021-03-11 11:26:28.470 JST [85232] FATAL:  terminating walreceiver process
> due to administrator command
> 2021-03-11 11:26:28.470 JST [85228] PANIC:  could not open file
> "pg_wal/000000020000000000000005": No such file or directory
> 2021-03-11 11:26:28.492 JST [85260] LOG:  started streaming WAL from primary
> at 0/5000000 on timeline 2
> 2021-03-11 11:26:29.260 JST [85227] LOG:  startup process (PID 85228) was
> terminated by signal 6: Aborted

I would like to clarify the conditions under which this "abort" occurred to explain to the customer.

By the result of pg_waldump, I think that the conditions are followings. 

1) A partially written (across the following segment files) WAL record is recorded at the end of the WAL segment file.
and
2) The WAL segment file of 1) is the last WAL segment file that standby server received, and
3) The standby server promoted.

I think that the above conditions will be met only when the primary server crashed due to full WAL storage.

Is my idea correct?

Regards.
Yusuke, Egahsira.


-----Original Message-----
From: PG Bug reporting form <noreply@postgresql.org> 
Sent: Thursday, March 11, 2021 5:06 PM
To: pgsql-bugs@lists.postgresql.org
Cc: Egashira, Yusuke/江頭 勇佑 <egashira.yusuke@fujitsu.com>
Subject: BUG #16922: In cascading replication, a standby server aborted when an upstream standby server promoted

The following bug has been logged on the website:

Bug reference:      16922
Logged by:          yusuke egashira
Email address:      egashira.yusuke@fujitsu.com
PostgreSQL version: 13.2
Operating system:   Red Hat Enterprise Linux 8
Description:        

Hi.

My customer uses PostgreSQL with a cascading replication.
The replication between "NODE-A" and "NODE-B" is synchronous replication,
and between "NODE-B" and "NODE-C" is asynchronous.

"NODE-A" <-[synchronous]-> "NODE-B" <-[non-synchronous]-> "NODE-C"

When the primary server "NODE-A" crashed due to full WAL storage and
"NODE-B" promoted, the downstream standby server "NODE-C" aborted with
following messages.

2021-03-11 11:26:28.470 JST [85228] LOG:  invalid contrecord length 26 at
0/5FFFFF0
2021-03-11 11:26:28.470 JST [85232] FATAL:  terminating walreceiver process
due to administrator command
2021-03-11 11:26:28.470 JST [85228] PANIC:  could not open file
"pg_wal/000000020000000000000005": No such file or directory
2021-03-11 11:26:28.492 JST [85260] LOG:  started streaming WAL from primary
at 0/5000000 on timeline 2
2021-03-11 11:26:29.260 JST [85227] LOG:  startup process (PID 85228) was
terminated by signal 6: Aborted
2021-03-11 11:26:29.260 JST [85227] LOG:  terminating any other active
server processes
2021-03-11 11:26:29.290 JST [85227] LOG:  database system is shut down


I found this behavior in PostgreSQL 13.2, 12.6, 11.11, 10.16, 9.6.21 and
9.5.24.

The reproduce procedure in PostgreSQL 13.2 on RHEL8 is:

===========================
# Create temporary wal data device.
mkdir -p /tmp/wal
dd if=/dev/zero of=/tmp/waldata bs=100MB count=1
sudo mkfs -t xfs /tmp/waldata
sudo mount -t xfs -o loop /tmp/waldata /tmp/wal
sudo chown postgres:postgres /tmp/wal
# mkdir -p /tmp/wal/pg_wal

# Create primary NODE-A.
initdb -D /tmp/nodea -X /tmp/wal/pg_wal

cat >>/tmp/nodea/postgresql.conf <<_EOF_
port=5432
synchronous_standby_names = 'nodeb'
restart_after_crash = off
wal_keep_size = 10GB
_EOF_

pg_ctl -D /tmp/nodea start -w -l /tmp/nodea_logfile

# Craete standby NODE-B.
pg_basebackup -D /tmp/nodeb -R -X fetch -d
"postgresql:///:5432?application_name=nodeb"
echo "port = 5433" >> /tmp/nodeb/postgresql.conf
pg_ctl -D /tmp/nodeb start -w -l /tmp/nodeb_logfile

# Create standby NODE-C
pg_basebackup -D /tmp/nodec -p 5433 -R -X fetch -d
"postgresql:///:5433?application_name=nodec"
echo "port = 5434" >> /tmp/nodec/postgresql.conf 
pg_ctl -D /tmp/nodec start -w

# Execute SQLs for WAL full.
createdb -p 5432
cat <<_psql | psql -p 5432
  create table test (a int, b text);
  insert into test values (generate_series(1,10000000), 'abcdefghijk');
_psql

# ** After down primary NODE-A **
pg_ctl -D /tmp/nodeb promote 

# In this timing, the NODE-C will down.
===========================

The stack trace of coredump with debug build is following:
# This stack trace was taken in a different environment than the other
attached logs.

#0  0x00007f73e3f1f8df in raise () from /lib64/libc.so.6
#1  0x00007f73e3f09cf5 in abort () from /lib64/libc.so.6
#2  0x0000000000878c83 in errfinish (filename=<optimized out>,
filename@entry=0x8cc67e "xlog.c", lineno=lineno@entry=3741,
funcname=funcname@entry=0x8d45c8 <__func__.28536> "XLogFileRead") at
elog.c:592
#3  0x0000000000507730 in XLogFileRead (segno=<optimized out>, tli=2,
source=source@entry=XLOG_FROM_STREAM, notfoundOk=notfoundOk@entry=false,
emode=22) at xlog.c:3739
#4  0x0000000000509c62 in WaitForWALToBecomeAvailable (tliRecPtr=134217712,
fetching_ckpt=<optimized out>, randAccess=false, RecPtr=117448704) at
xlog.c:12411
#5  XLogPageRead (xlogreader=0x282b478, targetPagePtr=<optimized out>,
reqLen=8192, targetRecPtr=134217712, readBuf=0x2858af8 "\006\321\a") at
xlog.c:11908
#6  0x0000000000517e25 in ReadPageInternal (state=state@entry=0x282b478,
pageptr=pageptr@entry=134209536, reqLen=reqLen@entry=8192) at
xlogreader.c:608
#7  0x000000000051886a in XLogReadRecord (state=state@entry=0x282b478,
errormsg=errormsg@entry=0x7ffd148db3f8) at xlogreader.c:329
#8  0x000000000050797e in ReadRecord (xlogreader=xlogreader@entry=0x282b478,
emode=emode@entry=15, fetching_ckpt=fetching_ckpt@entry=false) at
xlog.c:4346
#9  0x000000000050e205 in ReadRecord (fetching_ckpt=false, emode=15,
xlogreader=0x282b478) at xlog.c:4340
#10 StartupXLOG () at xlog.c:7385
#11 0x00000000006f72d1 in StartupProcessMain () at startup.c:204
#12 0x000000000051cf5b in AuxiliaryProcessMain (argc=argc@entry=2,
argv=argv@entry=0x7ffd148df240) at bootstrap.c:443
#13 0x00000000006f343f in StartChildProcess (type=StartupProcess) at
postmaster.c:5492
#14 0x00000000006f6938 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x282acc0) at postmaster.c:1404
#15 0x000000000047ecd9 in main (argc=3, argv=0x282acc0) at main.c:210


I also attach the debug5 level logs of NODE-C.

2021-03-11 11:06:39.136 JST [84342] LOG:  replication terminated by primary
server
2021-03-11 11:06:39.136 JST [84342] DETAIL:  End of WAL reached on timeline
1 at 0/6000000.
2021-03-11 11:06:39.140 JST [84342] LOG:  fetching timeline history file for
timeline 2 from primary server
2021-03-11 11:06:39.148 JST [84342] DEBUG:  walreceiver ended streaming and
awaits new instructions
2021-03-11 11:06:39.148 JST [84338] LOG:  new target timeline is 2
2021-03-11 11:06:39.148 JST [84338] DEBUG:  switched WAL source from stream
to archive after failure
2021-03-11 11:06:39.148 JST [84338] DEBUG:  could not open file
"pg_wal/000000020000000000000006": No such file or directory
2021-03-11 11:06:39.148 JST [84338] DEBUG:  switched WAL source from archive
to stream after failure
2021-03-11 11:06:39.148 JST [84342] LOG:  restarted WAL streaming at
0/6000000 on timeline 2
2021-03-11 11:06:39.148 JST [84342] DEBUG:  sendtime 2021-03-11
11:06:39.148674+09 receipttime 2021-03-11 11:06:39.148723+09 replication
apply delay 17910 ms transfer latency 1 ms
2021-03-11 11:06:39.148 JST [84342] DEBUG:  creating and filling new WAL
file
2021-03-11 11:06:39.240 JST [84342] DEBUG:  done creating and filling new
WAL file
2021-03-11 11:06:39.240 JST [84342] DEBUG:  sending write 0/6000048 flush
0/5FFFFF0 apply 0/5FFFFF0
2021-03-11 11:06:39.242 JST [84342] DEBUG:  sending write 0/6000048 flush
0/6000048 apply 0/5FFFFF0
2021-03-11 11:06:39.486 JST [84342] DEBUG:  sendtime 2021-03-11
11:06:39.486387+09 receipttime 2021-03-11 11:06:39.486449+09 replication
apply delay 18248 ms transfer latency 1 ms
2021-03-11 11:06:39.486 JST [84342] DEBUG:  sending write 0/6000080 flush
0/6000048 apply 0/5FFFFF0
2021-03-11 11:06:39.486 JST [84342] DEBUG:  sending write 0/6000080 flush
0/6000080 apply 0/5FFFFF0
2021-03-11 11:06:39.489 JST [84338] LOG:  invalid contrecord length 26 at
0/5FFFFF0
2021-03-11 11:06:39.489 JST [84342] FATAL:  terminating walreceiver process
due to administrator command
2021-03-11 11:06:39.489 JST [84342] DEBUG:  shmem_exit(1): 1
before_shmem_exit callbacks to make
2021-03-11 11:06:39.489 JST [84342] DEBUG:  shmem_exit(1): 6 on_shmem_exit
callbacks to make
2021-03-11 11:06:39.489 JST [84342] DEBUG:  proc_exit(1): 2 callbacks to
make
2021-03-11 11:06:39.489 JST [84342] DEBUG:  exit(1)
2021-03-11 11:06:39.489 JST [84342] DEBUG:  shmem_exit(-1): 0
before_shmem_exit callbacks to make
2021-03-11 11:06:39.489 JST [84342] DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2021-03-11 11:06:39.489 JST [84342] DEBUG:  proc_exit(-1): 0 callbacks to
make
2021-03-11 11:06:39.489 JST [84338] DEBUG:  switched WAL source from stream
to archive after failure
2021-03-11 11:06:39.489 JST [84338] DEBUG:  could not open file
"pg_wal/000000020000000000000005": No such file or directory
2021-03-11 11:06:39.489 JST [84338] DEBUG:  switched WAL source from archive
to stream after failure
2021-03-11 11:06:39.489 JST [84338] PANIC:  could not open file
"pg_wal/000000020000000000000005": No such file or directory
2021-03-11 11:06:39.501 JST [84337] DEBUG:  reaping dead processes
2021-03-11 11:06:39.513 JST [84367] DEBUG:  find_in_dynamic_libpath: trying
"/home/egashira/pgsql/lib/libpqwalreceiver"
2021-03-11 11:06:39.513 JST [84367] DEBUG:  find_in_dynamic_libpath: trying
"/home/egashira/pgsql/lib/libpqwalreceiver.so"
2021-03-11 11:06:39.541 JST [84367] LOG:  started streaming WAL from primary
at 0/5000000 on timeline 2
2021-03-11 11:06:39.542 JST [84367] DEBUG:  sendtime 2021-03-11
11:06:39.536374+09 receipttime 2021-03-11 11:06:39.54219+09 replication
apply delay 56 ms transfer latency 6 ms
2021-03-11 11:06:39.542 JST [84367] DEBUG:  creating and filling new WAL
file
2021-03-11 11:06:39.696 JST [84367] DEBUG:  done creating and filling new
WAL file


In addition, this issue may not occur depending on the timing.
When this issue didn't occur, the last WAL record of NODE-A was not
transferred to NODE-B.
I attach the result of pg_waldump for NODE-A, NODE-B and NODE-C.

[Reproduced]
$ pg_waldump nodea/pg_wal/000000010000000000000005  | tail -n 4
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFED0, prev 0/05FFFE88, desc: INSERT off 181 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFF18, prev 0/05FFFED0, desc: INSERT off 182 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFF60, prev 0/05FFFF18, desc: INSERT off 183 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFFA8, prev 0/05FFFF60, desc: INSERT off 184 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759

$ pg_waldump nodeb/pg_wal/000000010000000000000005  | tail -n 4
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFED0, prev 0/05FFFE88, desc: INSERT off 181 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFF18, prev 0/05FFFED0, desc: INSERT off 182 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFF60, prev 0/05FFFF18, desc: INSERT off 183 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFFA8, prev 0/05FFFF60, desc: INSERT off 184 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759

$ pg_waldump nodec/pg_wal/000000010000000000000005  | tail -n 4
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFED0, prev 0/05FFFE88, desc: INSERT off 181 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFF18, prev 0/05FFFED0, desc: INSERT off 182 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFF60, prev 0/05FFFF18, desc: INSERT off 183 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFFA8, prev 0/05FFFF60, desc: INSERT off 184 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759


[Not reproduced]
$ pg_waldump nodea/pg_wal/000000010000000000000005  | tail -n 4
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFED0, prev 0/05FFFE88, desc: INSERT off 181 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFF18, prev 0/05FFFED0, desc: INSERT off 182 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFF60, prev 0/05FFFF18, desc: INSERT off 183 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05FFFFA8, prev 0/05FFFF60, desc: INSERT off 184 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759

$ pg_waldump nodeb/pg_wal/000000010000000000000005  | tail -n 4
pg_waldump: fatal: error in WAL record at 0/5BFFF88: invalid magic number
0000 in log segment 000000010000000000000005, offset 12582912
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05BFFEB0, prev 0/05BFFE68, desc: INSERT off 2 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05BFFEF8, prev 0/05BFFEB0, desc: INSERT off 3 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05BFFF40, prev 0/05BFFEF8, desc: INSERT off 4 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05BFFF88, prev 0/05BFFF40, desc: INSERT off 5 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446

$ pg_waldump nodec/pg_wal/000000010000000000000005  | tail -n 4
pg_waldump: fatal: error in WAL record at 0/5BFFF88: invalid magic number
0000 in log segment 000000010000000000000005, offset 12582912
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05BFFEB0, prev 0/05BFFE68, desc: INSERT off 2 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05BFFEF8, prev 0/05BFFEB0, desc: INSERT off 3 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05BFFF40, prev 0/05BFFEF8, desc: INSERT off 4 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446
rmgr: Heap        len (rec/tot):     71/    71, tx:        487, lsn:
0/05BFFF88, prev 0/05BFFF40, desc: INSERT off 5 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #16942: Misleading error message on denied permission for column in table
Next
From: Fujii Masao
Date:
Subject: Re: BUG #16931: source code problem about commit_ts