Possible missing segments in archiving on standby - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Possible missing segments in archiving on standby
Date
Msg-id 20200630.165503.1465894182551545886.horikyota.ntt@gmail.com
Whole thread Raw
Responses Re: Possible missing segments in archiving on standby
List pgsql-hackers
Hello.

While looking a patch, I found that a standby with archive_mode=always
fails to archive segments under certain conditions.

A. Walreceiver is gracefully terminated just after a segment is
   finished.

B. Walreceiver is gracefully terminated while receiving filling chunks
  for a segment switch.

The two above are reprodusible (without distinction between the two)
using a helper patch. See below.

There's one more issue here.

C. Standby doesn't archive a segment until walreceiver receives any
  data for the next segment.

I'm not sure wehther we assume C as an issue.

The first attached patch fixes A and B. A side-effect of that is that
standby archives the previous segment of the streaming start
location. Concretely 00..0100..2 gets to be archived in the above case
(recovery starts at 0/3000000).  That behavior doesn't seem to be a
proble since the segment is a part of the standby's data anyway.

The second attached patch fixes all of A to C, but seems somewhat
redundant.

Any opnions and/or suggestions are welcome.


The attached files are:

1. v1-0001-Make-sure-standby-archives-all-segments.patch:
  Fix for A and B.

2. v1-0001-Make-sure-standby-archives-all-segments-immediate.patch:
  Fix for A, B and C.

3. repro.sh
  The reproducer shell script used below.

4. repro_helper.patch
  Helper patch for repro.sh for master and patch 1 above.

5. repro_helper2.patch
  Helper patch for repro.sh for patch 2 above.

=====
** REPRODUCER

The failure is reproducible with some code tweak.

1. Create a primary server with archive_mode=always then start it.
2. Create and start a standby.
3. touch /tmp/hoge

4. psql -c "create table t(); drop table t; select pg_switch_wal(); select pg_sleep(1); create table t(); drop table t;
selectpg_switch_wal();"
 

5. look into the archive directory of the standby.
   If no missing segments found in archive, repeat from 3.

The third attached shell script is a reproducer for the problem,
needing the aid of the fourth patch attached.

$ mkdir testdir
$ cd testdir
$ bash ..../repro.sh
....
After test 2:
Primary location: 0/8000310
Standby location: 0/8000310
# primary archive
000000010000000000000003
000000010000000000000004
000000010000000000000005
000000010000000000000006
000000010000000000000007
000000010000000000000008
# standby archive
000000010000000000000003
000000010000000000000005
000000010000000000000006
000000010000000000000008

The segment 4 is skipped by the issue A and 7 is skipped by the issue
B.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

From afa907bca7db8ea6335d47bd02761f567591d553 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Tue, 30 Jun 2020 14:21:30 +0900
Subject: [PATCH v1] Make sure standby archives all segments

Standby fails to archive a segment if standby is stopped just after a
segment is finished or stopped just after a segment swtich. Make sure
that walreceiver archives all segments by rechecking at start.
---
 src/backend/replication/walreceiver.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d1ad75da87..680154365d 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -938,6 +938,23 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
                 else
                     XLogArchiveNotify(xlogfname);
             }
+            else if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
+            {
+                /*
+                 * If we are starting streaming at the beginning of a segment,
+                 * there may be casees where the previous segment have not been
+                 * archived yet.  Make sure it is archived.
+                 */
+                char        xlogfname[MAXFNAMELEN];
+                XLogSegNo    prevseg;
+
+                XLByteToPrevSeg(recptr, prevseg, wal_segment_size);
+                XLogFileName(xlogfname, ThisTimeLineID, prevseg,
+                             wal_segment_size);
+
+                /* Mark as ".ready" of not yet */
+                XLogArchiveCheckDone(xlogfname);
+            }
             recvFile = -1;
 
             /* Create/use new log file */
-- 
2.18.4

From 7af716134dceb3bafce421dfeaffebf1e1e3e17d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Mon, 29 Jun 2020 16:12:01 +0900
Subject: [PATCH v1] Make sure standby archives all segments immediately

Standby may get a bit being late in archive, since walsender doesn't
archive a segment until it receives any data for the next segment, Fix
that by archiving just after a segment is finished.

Also, standby fails to archive a segment if standby is stopped just
after a segment is finished or stopped just after a segment
swtich. Make sure that walreceiver archives all segments by rechecking
at start.
---
 src/backend/replication/walreceiver.c | 82 ++++++++++++++++++---------
 1 file changed, 54 insertions(+), 28 deletions(-)

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d1ad75da87..831718c859 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -902,49 +902,34 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
     {
         int            segbytes;
 
-        if (recvFile < 0 || !XLByteInSeg(recptr, recvSegNo, wal_segment_size))
+        /* Open the segment if not yet */
+        if (recvFile < 0)
         {
             bool        use_existent;
 
+            recvFileTLI = ThisTimeLineID;
+
             /*
-             * fsync() and close current file before we switch to next one. We
-             * would otherwise have to reopen this file to fsync it later
+             * If we are starting streaming at the beginning of a segment,
+             * there may be the case where the previous segment have not been
+             * archived yet.  Make sure it is archived.
              */
-            if (recvFile >= 0)
+            if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS && recvSegNo == 0)
             {
                 char        xlogfname[MAXFNAMELEN];
+                XLogSegNo    prevseg;
 
-                XLogWalRcvFlush(false);
+                XLByteToPrevSeg(recptr, prevseg, wal_segment_size);
+                XLogFileName(xlogfname, recvFileTLI, prevseg, wal_segment_size);
 
-                XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
-
-                /*
-                 * XLOG segment files will be re-read by recovery in startup
-                 * process soon, so we don't advise the OS to release cache
-                 * pages associated with the file like XLogFileClose() does.
-                 */
-                if (close(recvFile) != 0)
-                    ereport(PANIC,
-                            (errcode_for_file_access(),
-                             errmsg("could not close log segment %s: %m",
-                                    xlogfname)));
-
-                /*
-                 * Create .done file forcibly to prevent the streamed segment
-                 * from being archived later.
-                 */
-                if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS)
-                    XLogArchiveForceDone(xlogfname);
-                else
-                    XLogArchiveNotify(xlogfname);
+                /* Mark as ".ready" of not yet */
+                XLogArchiveCheckDone(xlogfname);
             }
-            recvFile = -1;
 
             /* Create/use new log file */
             XLByteToSeg(recptr, recvSegNo, wal_segment_size);
             use_existent = true;
             recvFile = XLogFileInit(recvSegNo, &use_existent, true);
-            recvFileTLI = ThisTimeLineID;
         }
 
         /* Calculate the start offset of the received logs */
@@ -985,6 +970,47 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
         buf += byteswritten;
 
         LogstreamResult.Write = recptr;
+
+        /*
+         * Close the current WAL segment if it is completed then let the file
+         * be archived if needed.
+         */
+        if (!XLByteInSeg(recptr, recvSegNo, wal_segment_size))
+        {
+            char        xlogfname[MAXFNAMELEN];
+
+            Assert (recvFile >= 0);
+
+            /*
+             * fsync() and close current file before we switch to next one. We
+             * would otherwise have to reopen this file to fsync it later
+             */
+            XLogWalRcvFlush(false);
+
+            XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
+
+            /*
+             * XLOG segment files will be re-read by recovery in startup
+             * process soon, so we don't advise the OS to release cache
+             * pages associated with the file like XLogFileClose() does.
+             */
+            if (close(recvFile) != 0)
+                ereport(PANIC,
+                        (errcode_for_file_access(),
+                         errmsg("could not close log segment %s: %m",
+                                xlogfname)));
+
+            /*
+             * Create .done file forcibly to prevent the streamed segment
+             * from being archived later.
+             */
+            if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS)
+                XLogArchiveForceDone(xlogfname);
+            else
+                XLogArchiveNotify(xlogfname);
+
+            recvFile = -1;
+        }
     }
 
     /* Update shared-memory status */
-- 
2.18.4

#! /bin/bash

ROOT=`pwd`
LOGFILE="repro.log"
PGPORT1=15432
PGPORT2=15433

PGDATA1=$ROOT/reprodata1
ARCHDIR1=$ROOT/reproarc1
PGDATA2=$ROOT/reprodata2
ARCHDIR2=$ROOT/reproarc2


function cleanup {
    echo -n "Killing servers..."
    pg_ctl -D $PGDATA1 -m i stop
    pg_ctl -D $PGDATA2 -m i stop
    echo "done."
    exit 1
}

rm -r $PGDATA1 $PGDATA2 $ARCHDIR1 $ARCHDIR2
mkdir $ARCHDIR1 $ARCHDIR2

# Create primary
echo "# Creating primary"
initdb -D $PGDATA1 &>$LOGFILE
cat >> $PGDATA1/postgresql.conf <<EOF
wal_keep_segments=10
archive_mode=always
archive_command='cp %p $ARCHDIR1/%f'
EOF

# Start primary
echo "# Starting primary"
pg_ctl -D $PGDATA1 -o"-p $PGPORT1" start &>>$LOGFILE

# Create standby
echo "# Creating standby"
pg_basebackup -D $PGDATA2 -h /tmp -p $PGPORT1 &>>$LOGFILE
cat >> $PGDATA2/postgresql.conf <<EOF
archive_command='cp %p $ARCHDIR2/%f'
primary_conninfo='host=/tmp port=$PGPORT1'
EOF
touch $PGDATA2/standby.signal

trap cleanup ERR 2 3 15

# Start primary
echo "# Starting standby"
pg_ctl -D $PGDATA2 -o"-p $PGPORT2" start &>>$LOGFILE
sleep 3

echo "Start:"
echo -n "Primary location: "
psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()"
echo -n "Standby location: "
psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()"

# Delocate from boundary..
psql -p $PGPORT1 -c "create table t(); drop table t" &>>$LOGFILE
sleep 1

# TEST 1: walreceiver stops just after a segment is completed
echo "# test 1" >> $LOGFILE
touch /tmp/hoge1
psql -p $PGPORT1 -c "create table t(a int); insert into t (select a from generate_series(0, 260000) a); drop table t;"
&>>$LOGFILE
echo "# test 1 end" >> $LOGFILE

psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE
sleep 2

echo "After test 1:"
echo -n "Primary location: "
psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()"
echo -n "Standby location: "
psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()"

psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE
psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE
sleep 2

# TEST 2: walreceiver stops while receiving filling chunks after a wal switch.
echo "# test 2" >> $LOGFILE
touch /tmp/hoge2
psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE
echo "# test 2 end" >> $LOGFILE

sleep 2

echo "After test 2:"
echo -n "Primary location: "
psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()"
echo -n "Standby location: "
psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()"

# stop servers
pg_ctl -D $PGDATA1 stop &>>$LOGFILE
pg_ctl -D $PGDATA2 stop &>>$LOGFILE

#show last three archived segments
echo "# primary archive"
ls $ARCHDIR1 | egrep '[3-9]$'
echo "# standby archive"
ls $ARCHDIR2 | egrep '[3-9]$'
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d1ad75da87..dcccef151d 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -985,6 +985,29 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
         buf += byteswritten;
 
         LogstreamResult.Write = recptr;
+
+        {
+            /* fake oneshot SIGTERM just at segment end */
+            struct stat b;
+            char *sigfile1 = "/tmp/hoge1";
+            char *sigfile2 = "/tmp/hoge2";
+
+            if (LogstreamResult.Write % wal_segment_size == 0 &&
+                stat(sigfile1, &b) == 0)
+            {
+                unlink(sigfile1);
+                got_SIGTERM = true;
+                ereport(LOG,(errmsg("STOP BY trig1@%lX", LogstreamResult.Write)));
+            }
+
+            if (LogstreamResult.Write % wal_segment_size == 0x500000 &&
+                stat(sigfile2, &b) == 0)
+            {
+                unlink(sigfile2);
+                got_SIGTERM = true;
+                ereport(LOG,(errmsg("STOP BY trig2@%lX", LogstreamResult.Write)));
+            }
+        }
     }
 
     /* Update shared-memory status */
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 831718c859..b9a7c73ed7 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1011,6 +1011,29 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 
             recvFile = -1;
         }
+
+        {
+            /* fake oneshot SIGTERM just at segment end */
+            struct stat b;
+            char *sigfile1 = "/tmp/hoge1";
+            char *sigfile2 = "/tmp/hoge2";
+
+            if (LogstreamResult.Write % wal_segment_size == 0 &&
+                stat(sigfile1, &b) == 0)
+            {
+                unlink(sigfile1);
+                got_SIGTERM = true;
+                ereport(LOG,(errmsg("STOP BY trig1@%lX", LogstreamResult.Write)));
+            }
+
+            if (LogstreamResult.Write % wal_segment_size == 0x500000 &&
+                stat(sigfile2, &b) == 0)
+            {
+                unlink(sigfile2);
+                got_SIGTERM = true;
+                ereport(LOG,(errmsg("STOP BY trig2@%lX", LogstreamResult.Write)));
+            }
+        }
     }
 
     /* Update shared-memory status */

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: warnings for invalid function casts
Next
From: Fujii Masao
Date:
Subject: Re: min_safe_lsn column in pg_replication_slots view