Possible missing segments in archiving on standby - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Possible missing segments in archiving on standby |
Date | |
Msg-id | 20200630.165503.1465894182551545886.horikyota.ntt@gmail.com Whole thread Raw |
Responses |
Re: Possible missing segments in archiving on standby
|
List | pgsql-hackers |
Hello. While looking a patch, I found that a standby with archive_mode=always fails to archive segments under certain conditions. A. Walreceiver is gracefully terminated just after a segment is finished. B. Walreceiver is gracefully terminated while receiving filling chunks for a segment switch. The two above are reprodusible (without distinction between the two) using a helper patch. See below. There's one more issue here. C. Standby doesn't archive a segment until walreceiver receives any data for the next segment. I'm not sure wehther we assume C as an issue. The first attached patch fixes A and B. A side-effect of that is that standby archives the previous segment of the streaming start location. Concretely 00..0100..2 gets to be archived in the above case (recovery starts at 0/3000000). That behavior doesn't seem to be a proble since the segment is a part of the standby's data anyway. The second attached patch fixes all of A to C, but seems somewhat redundant. Any opnions and/or suggestions are welcome. The attached files are: 1. v1-0001-Make-sure-standby-archives-all-segments.patch: Fix for A and B. 2. v1-0001-Make-sure-standby-archives-all-segments-immediate.patch: Fix for A, B and C. 3. repro.sh The reproducer shell script used below. 4. repro_helper.patch Helper patch for repro.sh for master and patch 1 above. 5. repro_helper2.patch Helper patch for repro.sh for patch 2 above. ===== ** REPRODUCER The failure is reproducible with some code tweak. 1. Create a primary server with archive_mode=always then start it. 2. Create and start a standby. 3. touch /tmp/hoge 4. psql -c "create table t(); drop table t; select pg_switch_wal(); select pg_sleep(1); create table t(); drop table t; selectpg_switch_wal();" 5. look into the archive directory of the standby. If no missing segments found in archive, repeat from 3. The third attached shell script is a reproducer for the problem, needing the aid of the fourth patch attached. $ mkdir testdir $ cd testdir $ bash ..../repro.sh .... After test 2: Primary location: 0/8000310 Standby location: 0/8000310 # primary archive 000000010000000000000003 000000010000000000000004 000000010000000000000005 000000010000000000000006 000000010000000000000007 000000010000000000000008 # standby archive 000000010000000000000003 000000010000000000000005 000000010000000000000006 000000010000000000000008 The segment 4 is skipped by the issue A and 7 is skipped by the issue B. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From afa907bca7db8ea6335d47bd02761f567591d553 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com> Date: Tue, 30 Jun 2020 14:21:30 +0900 Subject: [PATCH v1] Make sure standby archives all segments Standby fails to archive a segment if standby is stopped just after a segment is finished or stopped just after a segment swtich. Make sure that walreceiver archives all segments by rechecking at start. --- src/backend/replication/walreceiver.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index d1ad75da87..680154365d 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -938,6 +938,23 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr) else XLogArchiveNotify(xlogfname); } + else if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS) + { + /* + * If we are starting streaming at the beginning of a segment, + * there may be casees where the previous segment have not been + * archived yet. Make sure it is archived. + */ + char xlogfname[MAXFNAMELEN]; + XLogSegNo prevseg; + + XLByteToPrevSeg(recptr, prevseg, wal_segment_size); + XLogFileName(xlogfname, ThisTimeLineID, prevseg, + wal_segment_size); + + /* Mark as ".ready" of not yet */ + XLogArchiveCheckDone(xlogfname); + } recvFile = -1; /* Create/use new log file */ -- 2.18.4 From 7af716134dceb3bafce421dfeaffebf1e1e3e17d Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com> Date: Mon, 29 Jun 2020 16:12:01 +0900 Subject: [PATCH v1] Make sure standby archives all segments immediately Standby may get a bit being late in archive, since walsender doesn't archive a segment until it receives any data for the next segment, Fix that by archiving just after a segment is finished. Also, standby fails to archive a segment if standby is stopped just after a segment is finished or stopped just after a segment swtich. Make sure that walreceiver archives all segments by rechecking at start. --- src/backend/replication/walreceiver.c | 82 ++++++++++++++++++--------- 1 file changed, 54 insertions(+), 28 deletions(-) diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index d1ad75da87..831718c859 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -902,49 +902,34 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr) { int segbytes; - if (recvFile < 0 || !XLByteInSeg(recptr, recvSegNo, wal_segment_size)) + /* Open the segment if not yet */ + if (recvFile < 0) { bool use_existent; + recvFileTLI = ThisTimeLineID; + /* - * fsync() and close current file before we switch to next one. We - * would otherwise have to reopen this file to fsync it later + * If we are starting streaming at the beginning of a segment, + * there may be the case where the previous segment have not been + * archived yet. Make sure it is archived. */ - if (recvFile >= 0) + if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS && recvSegNo == 0) { char xlogfname[MAXFNAMELEN]; + XLogSegNo prevseg; - XLogWalRcvFlush(false); + XLByteToPrevSeg(recptr, prevseg, wal_segment_size); + XLogFileName(xlogfname, recvFileTLI, prevseg, wal_segment_size); - XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size); - - /* - * XLOG segment files will be re-read by recovery in startup - * process soon, so we don't advise the OS to release cache - * pages associated with the file like XLogFileClose() does. - */ - if (close(recvFile) != 0) - ereport(PANIC, - (errcode_for_file_access(), - errmsg("could not close log segment %s: %m", - xlogfname))); - - /* - * Create .done file forcibly to prevent the streamed segment - * from being archived later. - */ - if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS) - XLogArchiveForceDone(xlogfname); - else - XLogArchiveNotify(xlogfname); + /* Mark as ".ready" of not yet */ + XLogArchiveCheckDone(xlogfname); } - recvFile = -1; /* Create/use new log file */ XLByteToSeg(recptr, recvSegNo, wal_segment_size); use_existent = true; recvFile = XLogFileInit(recvSegNo, &use_existent, true); - recvFileTLI = ThisTimeLineID; } /* Calculate the start offset of the received logs */ @@ -985,6 +970,47 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr) buf += byteswritten; LogstreamResult.Write = recptr; + + /* + * Close the current WAL segment if it is completed then let the file + * be archived if needed. + */ + if (!XLByteInSeg(recptr, recvSegNo, wal_segment_size)) + { + char xlogfname[MAXFNAMELEN]; + + Assert (recvFile >= 0); + + /* + * fsync() and close current file before we switch to next one. We + * would otherwise have to reopen this file to fsync it later + */ + XLogWalRcvFlush(false); + + XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size); + + /* + * XLOG segment files will be re-read by recovery in startup + * process soon, so we don't advise the OS to release cache + * pages associated with the file like XLogFileClose() does. + */ + if (close(recvFile) != 0) + ereport(PANIC, + (errcode_for_file_access(), + errmsg("could not close log segment %s: %m", + xlogfname))); + + /* + * Create .done file forcibly to prevent the streamed segment + * from being archived later. + */ + if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS) + XLogArchiveForceDone(xlogfname); + else + XLogArchiveNotify(xlogfname); + + recvFile = -1; + } } /* Update shared-memory status */ -- 2.18.4 #! /bin/bash ROOT=`pwd` LOGFILE="repro.log" PGPORT1=15432 PGPORT2=15433 PGDATA1=$ROOT/reprodata1 ARCHDIR1=$ROOT/reproarc1 PGDATA2=$ROOT/reprodata2 ARCHDIR2=$ROOT/reproarc2 function cleanup { echo -n "Killing servers..." pg_ctl -D $PGDATA1 -m i stop pg_ctl -D $PGDATA2 -m i stop echo "done." exit 1 } rm -r $PGDATA1 $PGDATA2 $ARCHDIR1 $ARCHDIR2 mkdir $ARCHDIR1 $ARCHDIR2 # Create primary echo "# Creating primary" initdb -D $PGDATA1 &>$LOGFILE cat >> $PGDATA1/postgresql.conf <<EOF wal_keep_segments=10 archive_mode=always archive_command='cp %p $ARCHDIR1/%f' EOF # Start primary echo "# Starting primary" pg_ctl -D $PGDATA1 -o"-p $PGPORT1" start &>>$LOGFILE # Create standby echo "# Creating standby" pg_basebackup -D $PGDATA2 -h /tmp -p $PGPORT1 &>>$LOGFILE cat >> $PGDATA2/postgresql.conf <<EOF archive_command='cp %p $ARCHDIR2/%f' primary_conninfo='host=/tmp port=$PGPORT1' EOF touch $PGDATA2/standby.signal trap cleanup ERR 2 3 15 # Start primary echo "# Starting standby" pg_ctl -D $PGDATA2 -o"-p $PGPORT2" start &>>$LOGFILE sleep 3 echo "Start:" echo -n "Primary location: " psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()" echo -n "Standby location: " psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()" # Delocate from boundary.. psql -p $PGPORT1 -c "create table t(); drop table t" &>>$LOGFILE sleep 1 # TEST 1: walreceiver stops just after a segment is completed echo "# test 1" >> $LOGFILE touch /tmp/hoge1 psql -p $PGPORT1 -c "create table t(a int); insert into t (select a from generate_series(0, 260000) a); drop table t;" &>>$LOGFILE echo "# test 1 end" >> $LOGFILE psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE sleep 2 echo "After test 1:" echo -n "Primary location: " psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()" echo -n "Standby location: " psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()" psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE sleep 2 # TEST 2: walreceiver stops while receiving filling chunks after a wal switch. echo "# test 2" >> $LOGFILE touch /tmp/hoge2 psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE echo "# test 2 end" >> $LOGFILE sleep 2 echo "After test 2:" echo -n "Primary location: " psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()" echo -n "Standby location: " psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()" # stop servers pg_ctl -D $PGDATA1 stop &>>$LOGFILE pg_ctl -D $PGDATA2 stop &>>$LOGFILE #show last three archived segments echo "# primary archive" ls $ARCHDIR1 | egrep '[3-9]$' echo "# standby archive" ls $ARCHDIR2 | egrep '[3-9]$' diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index d1ad75da87..dcccef151d 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -985,6 +985,29 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr) buf += byteswritten; LogstreamResult.Write = recptr; + + { + /* fake oneshot SIGTERM just at segment end */ + struct stat b; + char *sigfile1 = "/tmp/hoge1"; + char *sigfile2 = "/tmp/hoge2"; + + if (LogstreamResult.Write % wal_segment_size == 0 && + stat(sigfile1, &b) == 0) + { + unlink(sigfile1); + got_SIGTERM = true; + ereport(LOG,(errmsg("STOP BY trig1@%lX", LogstreamResult.Write))); + } + + if (LogstreamResult.Write % wal_segment_size == 0x500000 && + stat(sigfile2, &b) == 0) + { + unlink(sigfile2); + got_SIGTERM = true; + ereport(LOG,(errmsg("STOP BY trig2@%lX", LogstreamResult.Write))); + } + } } /* Update shared-memory status */ diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 831718c859..b9a7c73ed7 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -1011,6 +1011,29 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr) recvFile = -1; } + + { + /* fake oneshot SIGTERM just at segment end */ + struct stat b; + char *sigfile1 = "/tmp/hoge1"; + char *sigfile2 = "/tmp/hoge2"; + + if (LogstreamResult.Write % wal_segment_size == 0 && + stat(sigfile1, &b) == 0) + { + unlink(sigfile1); + got_SIGTERM = true; + ereport(LOG,(errmsg("STOP BY trig1@%lX", LogstreamResult.Write))); + } + + if (LogstreamResult.Write % wal_segment_size == 0x500000 && + stat(sigfile2, &b) == 0) + { + unlink(sigfile2); + got_SIGTERM = true; + ereport(LOG,(errmsg("STOP BY trig2@%lX", LogstreamResult.Write))); + } + } } /* Update shared-memory status */
pgsql-hackers by date: