Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)? - Mailing list pgsql-hackers
| From | Kyotaro HORIGUCHI |
|---|---|
| Subject | Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)? |
| Date | |
| Msg-id | 20171026.190551.208996945.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
| In response to | Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)? (Michael Paquier <michael.paquier@gmail.com>) |
| Responses |
Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
|
| List | pgsql-hackers |
Hello. Thank you for looking this.
At Mon, 16 Oct 2017 17:58:03 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqR+J1Xw+yzfsrehiQ+rh3ac+n5sEUgP7UOQ4_ymFnO9wg@mail.gmail.com>
> On Thu, Sep 7, 2017 at 12:33 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > At Wed, 6 Sep 2017 12:23:53 -0700, Andres Freund <andres@anarazel.de> wrote in
<20170906192353.ufp2dq7wm5fd6qa7@alap3.anarazel.de>
> >> I'm not following. All we need to use is the beginning of the relevant
> >> records, that's easy enough to keep track of. We don't need to read the
> >> WAL or anything.
> >
> > The beginning is already tracked and nothing more to do.
>
> I have finally allocated time to look at your newly-proposed patch,
> sorry for the time it took. Patch 0002 forgot to include sys/stat.h to
> allow the debugging tool to compile :)
>
> > The first *problem* was WaitForWALToBecomeAvailable requests the
> > beginning of a record, which is not on the page the function has
> > been told to fetch. Still tliRecPtr is required to determine the
> > TLI to request, it should request RecPtr to be streamed.
>
> [...]
>
> > The rest to do is let XLogPageRead retry other sources
> > immediately. To do this I made ValidXLogPageHeader@xlogreader.c
> > public (and renamed to XLogReaderValidatePageHeader).
> >
> > The patch attached fixes the problem and passes recovery
> > tests. However, the test for this problem is not added. It needs
> > to go to the last page in a segment then put a record continues
> > to the next segment, then kill the standby after receiving the
> > previous segment but before receiving the whole record.
>
> +typedef struct XLogPageHeaderData *XLogPageHeader;
> [...]
> +/* Validate a page */
> +extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
> + XLogRecPtr recptr, XLogPageHeader hdr);
> Instead of exposing XLogPageHeaderData, I would recommend just using
> char* and remove this declaration. The comment on top of
> XLogReaderValidatePageHeader needs to make clear what caller should
> provide.
Seems reasonable. Added several lines in the comment for the
function.
> + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
> + (XLogPageHeader) readBuf))
> + goto next_record_is_invalid;
> +
> [...]
> - ptr = tliRecPtr;
> + ptr = RecPtr;
> tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
>
> if (curFileTLI > 0 && tli < curFileTLI)
> The core of the patch is here (the patch has no comment so it is hard
> to understand what's the point of what is being done), and if I
Hmm, sorry. Added a brief comment there.
> understand that correctly, you allow the receiver to fetch the
> portions of a record spawned across multiple segments from different
> sources, and I am not sure that we'd want to break that promise.
We are allowing consecutive records at a segment boundary from
different sources are in the same series of xlog records. A
continuation records never spans over two TLIs but I might be
missing something here. (I found that an error message shows an
incorrect record pointer. The error message seems still be
useful.)
> Shouldn't we instead have the receiver side track the beginning of the
> record and send that position for the physical slot's restart_lsn?
The largest obstacle to do that is that walreceiver is not
utterly concerned to record internals. In other words, it doesn't
know what a record is. Teaching that introduces much complexity
and the complexity slows down the walreceiver.
Addition to that, this "problem" occurs also on replication
without a slot. The latest patch also help the case.
> This way the receiver would retain WAL segments from the real
> beginning of a record. restart_lsn for replication slots is set when
> processing the standby message in ProcessStandbyReplyMessage() using
> now the flush LSN, so a more correct value should be provided using
> that. Andres, what's your take on the matter?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
From b23c1d69ad86fcbb992cb21c604f587d82441001 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 7 Sep 2017 12:14:55 +0900
Subject: [PATCH 1/2] Allow switch WAL source midst of record.
The corrent recovery machinary assumes the whole of a record is
avaiable from single source. This prevents a standby from restarting
under a certain condition. This patch allows source switching during
reading a series of continuation records.
---src/backend/access/transam/xlog.c | 14 ++++++++++++--src/backend/access/transam/xlogreader.c | 28
++++++++++++++++++----------src/include/access/xlogreader.h | 4 ++++3 files changed, 34 insertions(+), 12
deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a1..0d639ec 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11647,6 +11647,10 @@ retry: Assert(reqLen <= readLen); *readTLI = curFileTLI;
+
+ if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
+ goto next_record_is_invalid;
+ return readLen;next_record_is_invalid:
@@ -11781,12 +11785,18 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, }
else {
- ptr = tliRecPtr;
+ /*
+ * Trying from the current RecPtr, not from the
+ * beginning of the current record. The record may
+ * be no longer available from the master.
+ */
+ ptr = RecPtr; tli = tliOfPointInHistory(tliRecPtr,
expectedTLEs); if (curFileTLI > 0 && tli < curFileTLI)
elog(ERROR,"according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came
fromtimeline %u",
- (uint32) (ptr >> 32), (uint32) ptr,
+ (uint32) (tliRecPtr >> 32),
+ (uint32) tliRecPtr, tli, curFileTLI);
} curFileTLI = tli;
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index b1f9b90..78a721a 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -27,8 +27,6 @@static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength);
-static bool ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr,
- XLogPageHeader hdr);static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);static bool
ValidXLogRecord(XLogReaderState*state, XLogRecord *record,
@@ -533,7 +531,6 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen) */ if (targetSegNo
!=state->readSegNo && targetPageOff != 0) {
- XLogPageHeader hdr; XLogRecPtr targetSegmentPtr = pageptr - targetPageOff; readLen =
state->read_page(state,targetSegmentPtr, XLOG_BLCKSZ,
@@ -545,9 +542,8 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen) /* we can be sure
tohave enough WAL available, we scrolled back */ Assert(readLen == XLOG_BLCKSZ);
- hdr = (XLogPageHeader) state->readBuf;
-
- if (!ValidXLogPageHeader(state, targetSegmentPtr, hdr))
+ if (!XLogReaderValidatePageHeader(state, targetSegmentPtr,
+ state->readBuf)) goto err; }
@@ -584,7 +580,7 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen) /* * Now that we
knowwe have the full header, validate it. */
- if (!ValidXLogPageHeader(state, pageptr, hdr))
+ if (!XLogReaderValidatePageHeader(state, pageptr, (char *) hdr)) goto err; /* update read state
information*/
@@ -710,14 +706,26 @@ ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr)/* * Validate a
pageheader
+ *
+ * Check if phdr is valid as the XLog page header of the XLog page of the page
+ * pointed by recptr.
+ *
+ * phdr is read as XLogPageHeaderData and check if it has valid magic, has no
+ * usused flag bits set and belongs to correct page pointed by recptr. The
+ * incorrect page address is detected usually when we read a page in a
+ * recycled segment.
+ *
+ * If it is the first page in a segment or detected rewind of state,
+ * additional checks are performed. */
-static bool
-ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr,
- XLogPageHeader hdr)
+bool
+XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
+ char *phdr){ XLogRecPtr recaddr; XLogSegNo segno; int32 offset;
+ XLogPageHeader hdr = (XLogPageHeader) phdr; Assert((recptr % XLOG_BLCKSZ) == 0);
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 3a9ebd4..758d880 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -205,6 +205,10 @@ extern void XLogReaderFree(XLogReaderState *state);extern struct XLogRecord
*XLogReadRecord(XLogReaderState*state, XLogRecPtr recptr, char **errormsg);
+/* Validate a page */
+extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
+ XLogRecPtr recptr, char *phdr);
+/* Invalidate read state */extern void XLogReaderInvalReadState(XLogReaderState *state);
--
2.9.2
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: