Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)? - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?
Date
Msg-id 20171026.190551.208996945.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
List pgsql-hackers
Hello. Thank you for looking this.

At Mon, 16 Oct 2017 17:58:03 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqR+J1Xw+yzfsrehiQ+rh3ac+n5sEUgP7UOQ4_ymFnO9wg@mail.gmail.com>
> On Thu, Sep 7, 2017 at 12:33 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > At Wed, 6 Sep 2017 12:23:53 -0700, Andres Freund <andres@anarazel.de> wrote in
<20170906192353.ufp2dq7wm5fd6qa7@alap3.anarazel.de>
> >> I'm not following. All we need to use is the beginning of the relevant
> >> records, that's easy enough to keep track of. We don't need to read the
> >> WAL or anything.
> >
> > The beginning is already tracked and nothing more to do.
> 
> I have finally allocated time to look at your newly-proposed patch,
> sorry for the time it took. Patch 0002 forgot to include sys/stat.h to
> allow the debugging tool to compile :)
> 
> > The first *problem* was WaitForWALToBecomeAvailable requests the
> > beginning of a record, which is not on the page the function has
> > been told to fetch. Still tliRecPtr is required to determine the
> > TLI to request, it should request RecPtr to be streamed.
> 
> [...]
> 
> > The rest to do is let XLogPageRead retry other sources
> > immediately. To do this I made ValidXLogPageHeader@xlogreader.c
> > public (and renamed to XLogReaderValidatePageHeader).
> >
> > The patch attached fixes the problem and passes recovery
> > tests. However, the test for this problem is not added. It needs
> > to go to the last page in a segment then put a record continues
> > to the next segment, then kill the standby after receiving the
> > previous segment but before receiving the whole record.
> 
> +typedef struct XLogPageHeaderData *XLogPageHeader;
> [...]
> +/* Validate a page */
> +extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
> +                                       XLogRecPtr recptr, XLogPageHeader hdr);
> Instead of exposing XLogPageHeaderData, I would recommend just using
> char* and remove this declaration. The comment on top of
> XLogReaderValidatePageHeader needs to make clear what caller should
> provide.

Seems reasonable. Added several lines in the comment for the
function.

> +    if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
> +                                      (XLogPageHeader) readBuf))
> +        goto next_record_is_invalid;
> +
> [...]
> -                            ptr = tliRecPtr;
> +                            ptr = RecPtr;
>                              tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
> 
>                              if (curFileTLI > 0 && tli < curFileTLI)
> The core of the patch is here (the patch has no comment so it is hard
> to understand what's the point of what is being done), and if I

Hmm, sorry. Added a brief comment there.

> understand that correctly, you allow the receiver to fetch the
> portions of a record spawned across multiple segments from different
> sources, and I am not sure that we'd want to break that promise.

We are allowing consecutive records at a segment boundary from
different sources are in the same series of xlog records. A
continuation records never spans over two TLIs but I might be
missing something here. (I found that an error message shows an
incorrect record pointer. The error message seems still be
useful.)

> Shouldn't we instead have the receiver side track the beginning of the
> record and send that position for the physical slot's restart_lsn?

The largest obstacle to do that is that walreceiver is not
utterly concerned to record internals. In other words, it doesn't
know what a record is. Teaching that introduces much complexity
and the complexity slows down the walreceiver.

Addition to that, this "problem" occurs also on replication
without a slot. The latest patch also help the case.

> This way the receiver would retain WAL segments from the real
> beginning of a record. restart_lsn for replication slots is set when
> processing the standby message in ProcessStandbyReplyMessage() using
> now the flush LSN, so a more correct value should be provided using
> that. Andres, what's your take on the matter?


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From b23c1d69ad86fcbb992cb21c604f587d82441001 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 7 Sep 2017 12:14:55 +0900
Subject: [PATCH 1/2] Allow switch WAL source midst of record.

The corrent recovery machinary assumes the whole of a record is
avaiable from single source. This prevents a standby from restarting
under a certain condition. This patch allows source switching during
reading a series of continuation records.
---src/backend/access/transam/xlog.c       | 14 ++++++++++++--src/backend/access/transam/xlogreader.c | 28
++++++++++++++++++----------src/include/access/xlogreader.h        |  4 ++++3 files changed, 34 insertions(+), 12
deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a1..0d639ec 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11647,6 +11647,10 @@ retry:    Assert(reqLen <= readLen);    *readTLI = curFileTLI;
+
+    if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
+        goto next_record_is_invalid;
+    return readLen;next_record_is_invalid:
@@ -11781,12 +11785,18 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,                        }
                 else                        {
 
-                            ptr = tliRecPtr;
+                            /*
+                             * Trying from the current RecPtr, not from the
+                             * beginning of the current record. The record may
+                             * be no longer available from the master.
+                             */
+                            ptr = RecPtr;                            tli = tliOfPointInHistory(tliRecPtr,
expectedTLEs);                           if (curFileTLI > 0 && tli < curFileTLI)
elog(ERROR,"according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came
fromtimeline %u",
 
-                                     (uint32) (ptr >> 32), (uint32) ptr,
+                                     (uint32) (tliRecPtr >> 32),
+                                     (uint32) tliRecPtr,                                     tli, curFileTLI);
              }                        curFileTLI = tli;
 
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index b1f9b90..78a721a 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -27,8 +27,6 @@static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength);
-static bool ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr,
-                    XLogPageHeader hdr);static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
                  XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);static bool
ValidXLogRecord(XLogReaderState*state, XLogRecord *record,
 
@@ -533,7 +531,6 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)     */    if (targetSegNo
!=state->readSegNo && targetPageOff != 0)    {
 
-        XLogPageHeader hdr;        XLogRecPtr    targetSegmentPtr = pageptr - targetPageOff;        readLen =
state->read_page(state,targetSegmentPtr, XLOG_BLCKSZ,
 
@@ -545,9 +542,8 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)        /* we can be sure
tohave enough WAL available, we scrolled back */        Assert(readLen == XLOG_BLCKSZ);
 
-        hdr = (XLogPageHeader) state->readBuf;
-
-        if (!ValidXLogPageHeader(state, targetSegmentPtr, hdr))
+        if (!XLogReaderValidatePageHeader(state, targetSegmentPtr,
+                                          state->readBuf))            goto err;    }
@@ -584,7 +580,7 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)    /*     * Now that we
knowwe have the full header, validate it.     */
 
-    if (!ValidXLogPageHeader(state, pageptr, hdr))
+    if (!XLogReaderValidatePageHeader(state, pageptr, (char *) hdr))        goto err;    /* update read state
information*/
 
@@ -710,14 +706,26 @@ ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr)/* * Validate a
pageheader
 
+ *
+ * Check if phdr is valid as the XLog page header of the XLog page of the page
+ * pointed by recptr.
+ *
+ * phdr is read as XLogPageHeaderData and check if it has valid magic, has no
+ * usused flag bits set and belongs to correct page pointed by recptr. The
+ * incorrect page address is detected usually when we read a page in a
+ * recycled segment.
+ *
+ * If it is the first page in a segment or detected rewind of state,
+ * additional checks are performed. */
-static bool
-ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr,
-                    XLogPageHeader hdr)
+bool
+XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
+                             char *phdr){    XLogRecPtr    recaddr;    XLogSegNo    segno;    int32        offset;
+    XLogPageHeader hdr = (XLogPageHeader) phdr;    Assert((recptr % XLOG_BLCKSZ) == 0);
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 3a9ebd4..758d880 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -205,6 +205,10 @@ extern void XLogReaderFree(XLogReaderState *state);extern struct XLogRecord
*XLogReadRecord(XLogReaderState*state,               XLogRecPtr recptr, char **errormsg);
 
+/* Validate a page */
+extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
+                    XLogRecPtr recptr, char *phdr);
+/* Invalidate read state */extern void XLogReaderInvalReadState(XLogReaderState *state);
-- 
2.9.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: [HACKERS] Subscriber resets additional columns to NULL on UPDATE
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath