Re: Possible missing segments in archiving on standby - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Possible missing segments in archiving on standby |
Date | |
Msg-id | 20210831.163536.2047557124269492272.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Possible missing segments in archiving on standby (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Possible missing segments in archiving on standby
|
List | pgsql-hackers |
At Tue, 31 Aug 2021 01:54:36 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2020/06/30 16:55, Kyotaro Horiguchi wrote: > > Hello. > > While looking a patch, I found that a standby with archive_mode=always > > fails to archive segments under certain conditions. > > I encountered this issue, too. > > > > 1. v1-0001-Make-sure-standby-archives-all-segments.patch: > > Fix for A and B. > > 2. v1-0001-Make-sure-standby-archives-all-segments-immediate.patch: > > Fix for A, B and C. > > You proposed two patches, but this patch should be reviewed > preferentially > because this addresses all the issues (i.e., A, B and C) that you > reported? Maybe. The point here was whether we regard C as an issue, but now I think it is an issue. > + * If we are starting streaming at the beginning of a segment, > + * there may be the case where the previous segment have not been > + * archived yet. Make sure it is archived. > > Could you clarify why the archive notification file of the previous > WAL segment needs to be checked? > > As far as I read the code, the cause of the issue seems to be that > XLogWalRcvWrite() exits without creating an archive notification file > even if the current WAL segment is fully written up in the last cycle > of > XLogWalRcvWrite()'s loop. That is, creation of the notification file > and WAL archiving of that completed segment will be delayed > until any data in the next segment is received and written (by next > call > to XLogWalRcvWrite()). Furthermore, in that case, if walreceiver exits > before receiving such next segment, the completed current segment > fails to be archived as Horiguchi-san reported. Right. Finally such segments are archived when a future checkpoint removes them. In that sense the patch works to just let archiving happens faster, but on the other hand I came to think we are supposed to archive a segment as soon as it is completed. (That is, I think C is a problem.) > Therefore, IMO that the simple approach to fix the issue is to create > an archive notification file if possible at the end of > XLogWalRcvWrite(). > I implemented this idea. Patch attached. I'm not sure which is simpler, but it works except for B, the case of a long-jump by a segment switch. When a segment switch happens, walsender sends filling zero-pages but even if walreceiver is terminated before the segment is completed, walsender restarts from the next segment at the next startup. Concretely like the following. - pg_switch_wal() invoked at 6003228 (for example) - walreceiver terminates at 6500000 (or a bit later). - walrecever rstarts from 7000000 In this case the segment 6 is not notified even with the patch, and my old patches works the same way. (In other words, the call to XLogWalRcvClose() at the end of XLogWalRcvWrite doens't work for the case as you might expect.) If we think it ok that we don't notify the segment earlier than a future checkpoint removes it, yours or only the last half of my one is sufficient, but do we really think so? Furthermore, your patch or only the last half of my second patch doesn't save the case of a crash unlike the case of a graceful termination. The attached files are: v2wip-0001-Make-sure... : a rebased patch of the old second patch repro_helper.diff : reproducer helper patch, used by the script below. repro.sh : reproducer script. (The second diff conflicts with the first patch. Since the second just inserts a single code block, it is easily applicable manually:p) regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 60de3be92c..a44d84f0a7 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -883,46 +883,32 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr) { int segbytes; - if (recvFile < 0 || !XLByteInSeg(recptr, recvSegNo, wal_segment_size)) + /* Open the segment if not yet */ + if (recvFile < 0) { + recvFileTLI = ThisTimeLineID; + /* - * fsync() and close current file before we switch to next one. We - * would otherwise have to reopen this file to fsync it later + * If we are starting streaming at the beginning of a segment, + * there may be the case where the previous segment have not been + * archived yet. Make sure it is archived. */ - if (recvFile >= 0) + if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS && recvSegNo == 0) { char xlogfname[MAXFNAMELEN]; + XLogSegNo prevseg; XLogWalRcvFlush(false); + XLByteToPrevSeg(recptr, prevseg, wal_segment_size); + XLogFileName(xlogfname, recvFileTLI, prevseg, wal_segment_size); - XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size); - - /* - * XLOG segment files will be re-read by recovery in startup - * process soon, so we don't advise the OS to release cache - * pages associated with the file like XLogFileClose() does. - */ - if (close(recvFile) != 0) - ereport(PANIC, - (errcode_for_file_access(), - errmsg("could not close log segment %s: %m", - xlogfname))); - - /* - * Create .done file forcibly to prevent the streamed segment - * from being archived later. - */ - if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS) - XLogArchiveForceDone(xlogfname); - else - XLogArchiveNotify(xlogfname, true); + /* Mark as ".ready" if not yet */ + XLogArchiveCheckDone(xlogfname); } - recvFile = -1; /* Create/use new log file */ XLByteToSeg(recptr, recvSegNo, wal_segment_size); recvFile = XLogFileInit(recvSegNo); - recvFileTLI = ThisTimeLineID; } /* Calculate the start offset of the received logs */ @@ -963,6 +949,47 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr) buf += byteswritten; LogstreamResult.Write = recptr; + + /* + * Close the current WAL segment if it is completed then let the file + * be archived if needed. + */ + if (!XLByteInSeg(recptr, recvSegNo, wal_segment_size)) + { + char xlogfname[MAXFNAMELEN]; + + Assert (recvFile >= 0); + + /* + * fsync() and close current file before we switch to next one. We + * would otherwise have to reopen this file to fsync it later + */ + XLogWalRcvFlush(false); + + XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size); + + /* + * XLOG segment files will be re-read by recovery in startup + * process soon, so we don't advise the OS to release cache + * pages associated with the file like XLogFileClose() does. + */ + if (close(recvFile) != 0) + ereport(PANIC, + (errcode_for_file_access(), + errmsg("could not close log segment %s: %m", + xlogfname))); + + /* + * Create .done file forcibly to prevent the streamed segment + * from being archived later. + */ + if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS) + XLogArchiveForceDone(xlogfname); + else + XLogArchiveNotify(xlogfname, true); + + recvFile = -1; + } } /* Update shared-memory status */ diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 60de3be92c..396510a81d 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -963,6 +963,29 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr) buf += byteswritten; LogstreamResult.Write = recptr; + + { + /* fake oneshot SIGTERM just at segment end */ + struct stat b; + char *sigfile1 = "/tmp/hoge1"; + char *sigfile2 = "/tmp/hoge2"; + + if (LogstreamResult.Write % wal_segment_size == 0 && + stat(sigfile1, &b) == 0) + { + unlink(sigfile1); + ShutdownRequestPending = true; + ereport(LOG,(errmsg("STOP BY trig1@%lX", LogstreamResult.Write))); + } + + if (LogstreamResult.Write % wal_segment_size == 0x500000 && + stat(sigfile2, &b) == 0) + { + unlink(sigfile2); + ShutdownRequestPending = true; + ereport(LOG,(errmsg("STOP BY trig2@%lX", LogstreamResult.Write))); + } + } } /* Update shared-memory status */ #! /bin/bash ROOT=`pwd` LOGFILE="repro.log" PGPORT1=15432 PGPORT2=15433 PGDATA1=$ROOT/reprodata1 ARCHDIR1=$ROOT/reproarc1 PGDATA2=$ROOT/reprodata2 ARCHDIR2=$ROOT/reproarc2 function cleanup { echo -n "Killing servers..." pg_ctl -D $PGDATA1 -m i stop pg_ctl -D $PGDATA2 -m i stop echo "done." exit 1 } rm -r $PGDATA1 $PGDATA2 $ARCHDIR1 $ARCHDIR2 mkdir $ARCHDIR1 $ARCHDIR2 # Create primary echo "# Creating primary" initdb -D $PGDATA1 &>$LOGFILE cat >> $PGDATA1/postgresql.conf <<EOF wal_keep_size=160 archive_mode=always archive_command='cp %p $ARCHDIR1/%f' EOF # Start primary echo "# Starting primary" pg_ctl -D $PGDATA1 -o"-p $PGPORT1" start &>>$LOGFILE # Create standby echo "# Creating standby" pg_basebackup -D $PGDATA2 -h /tmp -p $PGPORT1 &>>$LOGFILE cat >> $PGDATA2/postgresql.conf <<EOF archive_command='cp %p $ARCHDIR2/%f' primary_conninfo='host=/tmp port=$PGPORT1' EOF touch $PGDATA2/standby.signal trap cleanup ERR 2 3 15 # Start primary echo "# Starting standby" pg_ctl -D $PGDATA2 -o"-p $PGPORT2" start &>>$LOGFILE sleep 3 echo "Start:" echo -n "Primary location: " psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()" echo -n "Standby location: " psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()" # Delocate from boundary.. psql -p $PGPORT1 -c "create table t(); drop table t" &>>$LOGFILE sleep 1 # TEST 1: walreceiver stops just after a segment is completed echo "Before test 1:" echo -n "Primary location: " psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()" echo -n "Standby location: " psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()" echo "# test 1" >> $LOGFILE touch /tmp/hoge1 # the number of records should be adjusted so that the LSN at "before # end test 1" below is locaed in the last page of the segment 3. psql -p $PGPORT1 -c "create table t(a int); insert into t (select a from generate_series(0, 259500) a); drop table t;" &>>$LOGFILE echo "# test 1 end" >> $LOGFILE psql -p $PGPORT1 -c "create table t(); drop table t" &>>$LOGFILE echo "before end test 1:" echo -n "Primary location: " psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()" echo -n "Standby location: " psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()" psql -p $PGPORT1 -c "select pg_switch_wal()" &>>$LOGFILE sleep 2 echo "After test 1:" echo -n "Primary location: " psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()" echo -n "Standby location: " psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()" psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE sleep 2 # TEST 2: walreceiver stops while receiving filling chunks after a wal switch. echo "# test 2" >> $LOGFILE touch /tmp/hoge2 psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE echo "# test 2 end" >> $LOGFILE sleep 2 echo "After test 2:" echo -n "Primary location: " psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()" echo -n "Standby location: " psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()" # stop servers pg_ctl -D $PGDATA1 stop &>>$LOGFILE pg_ctl -D $PGDATA2 stop &>>$LOGFILE #show last several archived segments echo "# primary archive" ls $ARCHDIR1 | egrep '[3-9]$' echo "# standby archive" ls $ARCHDIR2 | egrep '[3-9]$'
pgsql-hackers by date: