Possible missing segments in archiving on standby - Mailing list pgsql-hackers
| From | Kyotaro Horiguchi |
|---|---|
| Subject | Possible missing segments in archiving on standby |
| Date | |
| Msg-id | 20200630.165503.1465894182551545886.horikyota.ntt@gmail.com Whole thread Raw |
| Responses |
Re: Possible missing segments in archiving on standby
|
| List | pgsql-hackers |
Hello.
While looking a patch, I found that a standby with archive_mode=always
fails to archive segments under certain conditions.
A. Walreceiver is gracefully terminated just after a segment is
finished.
B. Walreceiver is gracefully terminated while receiving filling chunks
for a segment switch.
The two above are reprodusible (without distinction between the two)
using a helper patch. See below.
There's one more issue here.
C. Standby doesn't archive a segment until walreceiver receives any
data for the next segment.
I'm not sure wehther we assume C as an issue.
The first attached patch fixes A and B. A side-effect of that is that
standby archives the previous segment of the streaming start
location. Concretely 00..0100..2 gets to be archived in the above case
(recovery starts at 0/3000000). That behavior doesn't seem to be a
proble since the segment is a part of the standby's data anyway.
The second attached patch fixes all of A to C, but seems somewhat
redundant.
Any opnions and/or suggestions are welcome.
The attached files are:
1. v1-0001-Make-sure-standby-archives-all-segments.patch:
Fix for A and B.
2. v1-0001-Make-sure-standby-archives-all-segments-immediate.patch:
Fix for A, B and C.
3. repro.sh
The reproducer shell script used below.
4. repro_helper.patch
Helper patch for repro.sh for master and patch 1 above.
5. repro_helper2.patch
Helper patch for repro.sh for patch 2 above.
=====
** REPRODUCER
The failure is reproducible with some code tweak.
1. Create a primary server with archive_mode=always then start it.
2. Create and start a standby.
3. touch /tmp/hoge
4. psql -c "create table t(); drop table t; select pg_switch_wal(); select pg_sleep(1); create table t(); drop table t;
selectpg_switch_wal();"
5. look into the archive directory of the standby.
If no missing segments found in archive, repeat from 3.
The third attached shell script is a reproducer for the problem,
needing the aid of the fourth patch attached.
$ mkdir testdir
$ cd testdir
$ bash ..../repro.sh
....
After test 2:
Primary location: 0/8000310
Standby location: 0/8000310
# primary archive
000000010000000000000003
000000010000000000000004
000000010000000000000005
000000010000000000000006
000000010000000000000007
000000010000000000000008
# standby archive
000000010000000000000003
000000010000000000000005
000000010000000000000006
000000010000000000000008
The segment 4 is skipped by the issue A and 7 is skipped by the issue
B.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From afa907bca7db8ea6335d47bd02761f567591d553 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Tue, 30 Jun 2020 14:21:30 +0900
Subject: [PATCH v1] Make sure standby archives all segments
Standby fails to archive a segment if standby is stopped just after a
segment is finished or stopped just after a segment swtich. Make sure
that walreceiver archives all segments by rechecking at start.
---
src/backend/replication/walreceiver.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d1ad75da87..680154365d 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -938,6 +938,23 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
else
XLogArchiveNotify(xlogfname);
}
+ else if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
+ {
+ /*
+ * If we are starting streaming at the beginning of a segment,
+ * there may be casees where the previous segment have not been
+ * archived yet. Make sure it is archived.
+ */
+ char xlogfname[MAXFNAMELEN];
+ XLogSegNo prevseg;
+
+ XLByteToPrevSeg(recptr, prevseg, wal_segment_size);
+ XLogFileName(xlogfname, ThisTimeLineID, prevseg,
+ wal_segment_size);
+
+ /* Mark as ".ready" of not yet */
+ XLogArchiveCheckDone(xlogfname);
+ }
recvFile = -1;
/* Create/use new log file */
--
2.18.4
From 7af716134dceb3bafce421dfeaffebf1e1e3e17d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Mon, 29 Jun 2020 16:12:01 +0900
Subject: [PATCH v1] Make sure standby archives all segments immediately
Standby may get a bit being late in archive, since walsender doesn't
archive a segment until it receives any data for the next segment, Fix
that by archiving just after a segment is finished.
Also, standby fails to archive a segment if standby is stopped just
after a segment is finished or stopped just after a segment
swtich. Make sure that walreceiver archives all segments by rechecking
at start.
---
src/backend/replication/walreceiver.c | 82 ++++++++++++++++++---------
1 file changed, 54 insertions(+), 28 deletions(-)
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d1ad75da87..831718c859 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -902,49 +902,34 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
{
int segbytes;
- if (recvFile < 0 || !XLByteInSeg(recptr, recvSegNo, wal_segment_size))
+ /* Open the segment if not yet */
+ if (recvFile < 0)
{
bool use_existent;
+ recvFileTLI = ThisTimeLineID;
+
/*
- * fsync() and close current file before we switch to next one. We
- * would otherwise have to reopen this file to fsync it later
+ * If we are starting streaming at the beginning of a segment,
+ * there may be the case where the previous segment have not been
+ * archived yet. Make sure it is archived.
*/
- if (recvFile >= 0)
+ if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS && recvSegNo == 0)
{
char xlogfname[MAXFNAMELEN];
+ XLogSegNo prevseg;
- XLogWalRcvFlush(false);
+ XLByteToPrevSeg(recptr, prevseg, wal_segment_size);
+ XLogFileName(xlogfname, recvFileTLI, prevseg, wal_segment_size);
- XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
-
- /*
- * XLOG segment files will be re-read by recovery in startup
- * process soon, so we don't advise the OS to release cache
- * pages associated with the file like XLogFileClose() does.
- */
- if (close(recvFile) != 0)
- ereport(PANIC,
- (errcode_for_file_access(),
- errmsg("could not close log segment %s: %m",
- xlogfname)));
-
- /*
- * Create .done file forcibly to prevent the streamed segment
- * from being archived later.
- */
- if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS)
- XLogArchiveForceDone(xlogfname);
- else
- XLogArchiveNotify(xlogfname);
+ /* Mark as ".ready" of not yet */
+ XLogArchiveCheckDone(xlogfname);
}
- recvFile = -1;
/* Create/use new log file */
XLByteToSeg(recptr, recvSegNo, wal_segment_size);
use_existent = true;
recvFile = XLogFileInit(recvSegNo, &use_existent, true);
- recvFileTLI = ThisTimeLineID;
}
/* Calculate the start offset of the received logs */
@@ -985,6 +970,47 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
buf += byteswritten;
LogstreamResult.Write = recptr;
+
+ /*
+ * Close the current WAL segment if it is completed then let the file
+ * be archived if needed.
+ */
+ if (!XLByteInSeg(recptr, recvSegNo, wal_segment_size))
+ {
+ char xlogfname[MAXFNAMELEN];
+
+ Assert (recvFile >= 0);
+
+ /*
+ * fsync() and close current file before we switch to next one. We
+ * would otherwise have to reopen this file to fsync it later
+ */
+ XLogWalRcvFlush(false);
+
+ XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
+
+ /*
+ * XLOG segment files will be re-read by recovery in startup
+ * process soon, so we don't advise the OS to release cache
+ * pages associated with the file like XLogFileClose() does.
+ */
+ if (close(recvFile) != 0)
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not close log segment %s: %m",
+ xlogfname)));
+
+ /*
+ * Create .done file forcibly to prevent the streamed segment
+ * from being archived later.
+ */
+ if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS)
+ XLogArchiveForceDone(xlogfname);
+ else
+ XLogArchiveNotify(xlogfname);
+
+ recvFile = -1;
+ }
}
/* Update shared-memory status */
--
2.18.4
#! /bin/bash
ROOT=`pwd`
LOGFILE="repro.log"
PGPORT1=15432
PGPORT2=15433
PGDATA1=$ROOT/reprodata1
ARCHDIR1=$ROOT/reproarc1
PGDATA2=$ROOT/reprodata2
ARCHDIR2=$ROOT/reproarc2
function cleanup {
echo -n "Killing servers..."
pg_ctl -D $PGDATA1 -m i stop
pg_ctl -D $PGDATA2 -m i stop
echo "done."
exit 1
}
rm -r $PGDATA1 $PGDATA2 $ARCHDIR1 $ARCHDIR2
mkdir $ARCHDIR1 $ARCHDIR2
# Create primary
echo "# Creating primary"
initdb -D $PGDATA1 &>$LOGFILE
cat >> $PGDATA1/postgresql.conf <<EOF
wal_keep_segments=10
archive_mode=always
archive_command='cp %p $ARCHDIR1/%f'
EOF
# Start primary
echo "# Starting primary"
pg_ctl -D $PGDATA1 -o"-p $PGPORT1" start &>>$LOGFILE
# Create standby
echo "# Creating standby"
pg_basebackup -D $PGDATA2 -h /tmp -p $PGPORT1 &>>$LOGFILE
cat >> $PGDATA2/postgresql.conf <<EOF
archive_command='cp %p $ARCHDIR2/%f'
primary_conninfo='host=/tmp port=$PGPORT1'
EOF
touch $PGDATA2/standby.signal
trap cleanup ERR 2 3 15
# Start primary
echo "# Starting standby"
pg_ctl -D $PGDATA2 -o"-p $PGPORT2" start &>>$LOGFILE
sleep 3
echo "Start:"
echo -n "Primary location: "
psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()"
echo -n "Standby location: "
psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()"
# Delocate from boundary..
psql -p $PGPORT1 -c "create table t(); drop table t" &>>$LOGFILE
sleep 1
# TEST 1: walreceiver stops just after a segment is completed
echo "# test 1" >> $LOGFILE
touch /tmp/hoge1
psql -p $PGPORT1 -c "create table t(a int); insert into t (select a from generate_series(0, 260000) a); drop table t;"
&>>$LOGFILE
echo "# test 1 end" >> $LOGFILE
psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE
sleep 2
echo "After test 1:"
echo -n "Primary location: "
psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()"
echo -n "Standby location: "
psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()"
psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE
psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE
sleep 2
# TEST 2: walreceiver stops while receiving filling chunks after a wal switch.
echo "# test 2" >> $LOGFILE
touch /tmp/hoge2
psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE
echo "# test 2 end" >> $LOGFILE
sleep 2
echo "After test 2:"
echo -n "Primary location: "
psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()"
echo -n "Standby location: "
psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()"
# stop servers
pg_ctl -D $PGDATA1 stop &>>$LOGFILE
pg_ctl -D $PGDATA2 stop &>>$LOGFILE
#show last three archived segments
echo "# primary archive"
ls $ARCHDIR1 | egrep '[3-9]$'
echo "# standby archive"
ls $ARCHDIR2 | egrep '[3-9]$'
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d1ad75da87..dcccef151d 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -985,6 +985,29 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
buf += byteswritten;
LogstreamResult.Write = recptr;
+
+ {
+ /* fake oneshot SIGTERM just at segment end */
+ struct stat b;
+ char *sigfile1 = "/tmp/hoge1";
+ char *sigfile2 = "/tmp/hoge2";
+
+ if (LogstreamResult.Write % wal_segment_size == 0 &&
+ stat(sigfile1, &b) == 0)
+ {
+ unlink(sigfile1);
+ got_SIGTERM = true;
+ ereport(LOG,(errmsg("STOP BY trig1@%lX", LogstreamResult.Write)));
+ }
+
+ if (LogstreamResult.Write % wal_segment_size == 0x500000 &&
+ stat(sigfile2, &b) == 0)
+ {
+ unlink(sigfile2);
+ got_SIGTERM = true;
+ ereport(LOG,(errmsg("STOP BY trig2@%lX", LogstreamResult.Write)));
+ }
+ }
}
/* Update shared-memory status */
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 831718c859..b9a7c73ed7 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1011,6 +1011,29 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
recvFile = -1;
}
+
+ {
+ /* fake oneshot SIGTERM just at segment end */
+ struct stat b;
+ char *sigfile1 = "/tmp/hoge1";
+ char *sigfile2 = "/tmp/hoge2";
+
+ if (LogstreamResult.Write % wal_segment_size == 0 &&
+ stat(sigfile1, &b) == 0)
+ {
+ unlink(sigfile1);
+ got_SIGTERM = true;
+ ereport(LOG,(errmsg("STOP BY trig1@%lX", LogstreamResult.Write)));
+ }
+
+ if (LogstreamResult.Write % wal_segment_size == 0x500000 &&
+ stat(sigfile2, &b) == 0)
+ {
+ unlink(sigfile2);
+ got_SIGTERM = true;
+ ereport(LOG,(errmsg("STOP BY trig2@%lX", LogstreamResult.Write)));
+ }
+ }
}
/* Update shared-memory status */
pgsql-hackers by date: