Re: [HACKERS] Restricting maximum keep segments by repslots - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] Restricting maximum keep segments by repslots
Date
Msg-id 20190130.104204.249058820.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] Restricting maximum keep segments by repslots  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [HACKERS] Restricting maximum keep segments by repslots  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
At Thu, 20 Dec 2018 16:24:38 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20181220.162438.121484007.horiguchi.kyotaro@lab.ntt.co.jp>
> Thank you for piking this and sorry being late.
> 
> At Mon, 19 Nov 2018 13:39:58 +0900, Michael Paquier <michael@paquier.xyz> wrote in
<20181119043958.GE4400@paquier.xyz>
> > ereport should not be called within xlogreader.c as a base rule:
> 
> Ouch! I forgot that. Fixed to use report_invalid_record slightly
> changing the message. The code is not required (or cannot be
> used) on frontend so #ifndef FRONTENDed the code.
> 
> At Tue, 20 Nov 2018 14:07:44 +0900, Michael Paquier <michael@paquier.xyz> wrote in
<20181120050744.GJ4400@paquier.xyz>
> > On Mon, Nov 19, 2018 at 01:39:58PM +0900, Michael Paquier wrote:
> > > I was just coming by to look at bit at the patch series, and bumped
> > > into that:
> > 
> > So I have been looking at the last patch series 0001-0004 posted on this
> > thread, and coming from here:
> > https://postgr.es/m/20181025.215518.189844649.horiguchi.kyotaro@lab.ntt.co.jp
> > 
> > /* check that the slot is gone */
> > SELECT * FROM pg_replication_slots
> > It could be an idea to switch to the expanded mode here, not that it
> > matters much still..
> 
> No problem doing that. Done.
> 
> TAP test complains that it still uses recovery.conf. Fixed. On
> the way doing that I added parameter primary_slot_name to
> init_from_backup in PostgresNode.pm
> 
> > +IsLsnStillAvaiable(XLogRecPtr targetLSN, uint64 *restBytes)
> > You mean Available here, not Avaiable.  This function is only used when
> > scanning for slot information with pg_replication_slots, so wouldn't it
> > be better to just return the status string in this case?
> 
> Mmm. Sure. Auto-completion hid it from my eyes. Fixed the name.
> The fix sounds reasonable. The function was created as returning
> boolean and the name doen't fit the current function. I renamed
> the name to GetLsnAvailability() that returns a string.
> 
> > Not sure I see the point of the "remain" field, which can be found with
> > a simple calculation using the current insertion LSN, the segment size
> > and the amount of WAL that the slot is retaining.  It may be interesting
> > to document a query to do that though.
> 
> It's not that simple. wal_segment_size, max_slot_wal_keep_size,
> wal_keep_segments, max_slot_wal_keep_size and the current LSN are
> invoved in the calculation which including several conditional
> branches, maybe as you see upthread. We could show "the largest
> current LSN until WAL is lost" but the "current LSN" is not shown
> there. So it is showing the "remain".
> 
> > GetOldestXLogFileSegNo() has race conditions if WAL recycling runs in
> > parallel, no?  How is it safe to scan pg_wal on a process querying
> > pg_replication_slots while another process may manipulate its contents
> > (aka the checkpointer or just the startup process with an
> > end-of-recovery checkpoint.).  This routine relies on unsafe
> > assumptions as this is not concurrent-safe.  You can avoid problems by
> > making sure instead that lastRemovedSegNo is initialized correctly at
> > startup, which would be normally one segment older than what's in
> > pg_wal, which feels a bit hacky to rely on to track the oldest segment.
> 
> Concurrent recycling makes the function's result vary between the
> segment numbers before and after it. It is unstable but doesn't
> matter so much. The reason for the timing is to avoid extra
> startup time by a scan over pg_wal that is unncecessary in most
> cases.
> 
> Anyway the attached patch initializes lastRemovedSegNo in
> StartupXLOG().
> 
> > It seems to me that GetOldestXLogFileSegNo() should also check for
> > segments matching the current timeline, no?
> 
> RemoveOldXlogFiles() ignores timeline and the function is made to
> behave the same way (in different manner). I added a comment for
> the behavior in the function.
> 
> > +           if (prev_lost_segs != lost_segs)
> > +               ereport(WARNING,
> > +                       (errmsg ("some replication slots have lost
> > required WAL segments"),
> > +                        errdetail_plural(
> > +                            "The mostly affected slot has lost %ld
> > segment.",
> > +                            "The mostly affected slot has lost %ld
> > segments.",
> > +                            lost_segs, lost_segs)));
> > This can become very noisy with the time, and it would be actually
> > useful to know which replication slot is impacted by that.
> 
> One message per one segment doen't seem so noisy. The reason for
> not showing slot identifier individually is just to avoid
> complexity comes from involving slot details. DBAs will see the
> details in pg_stat_replication.
> 
> Anyway I did that in the attached patch. ReplicationSlotsBehind
> returns the list of the slot names that behind specified
> LSN. With this patch the messages looks as the follows:
> 
> WARNING:  some replication slots have lost required WAL segments
> DETAIL:  Slot s1 lost 8 segment(s).
> WARNING:  some replication slots have lost required WAL segments
> DETAIL:  Slots s1, s2, s3 lost at most 9 segment(s).
> 
> > +      slot doesn't have valid restart_lsn, this field
> > Missing a determinant here, and restart_lsn should have a <literal>
> > markup.
> 
> structfield? Reworded as below:
> 
> |  non-negative. If <structfield>restart_lsn</structfield> is NULL, this
> |  field is <literal>unknown</literal>.
> 
> I changed "the slot" with "this slot" in the two added fields
> (wal_status, remain).
> 
> > +    many WAL segments that they fill up the space allotted
> > s/allotted/allocated/.
> 
> Fixed.
> 
> > +      available. The last two states are seen only when
> > +      <xref linkend="guc-max-slot-wal-keep-size"/> is non-negative. If the
> > +      slot doesn't have valid restart_lsn, this field
> > +      is <literal>unknown</literal>.
> > I am a bit confused by this statement.  The last two states are "lost"
> > and "keeping", but shouldn't "keeping" be the state showing up by
> > default as it means that all WAL segments are kept around.
> 
> It's "streaming".  I didn't came up with nice words to
> distinguish the two states. I'm not sure "keep around" exactly
> means but "keeping" here means rather "just not removed yet". The
> states could be reworded as the follows:
> 
> streaming: kept/keeping/(secure, in the first version)
> keeping  : mortal/about to be removed
> lost/unkown : (lost/unknown)
> 
> Do you have any better wording?
> 
> > +# Advance WAL by ten segments (= 160MB) on master
> > +advance_wal($node_master, 10);
> > +$node_master->safe_psql('postgres', "CHECKPOINT;");
> > This makes the tests very costly, which is something we should avoid as
> > much as possible.  One trick which could be used here, on top of
> > reducing the number of segment switches, is to use initdb
> > --wal-segsize=1.
> 
> That sounds nice. Done. In the new version the number of segments
> can be reduced and a new test item for the initial unkonwn state
> as the first item.
> 
> Please find the attached new version.

Rebased. No conflict found since the last version.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 270aff9b08ced425b4c4e23b53193285eb2359a6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 21 Dec 2017 21:20:20 +0900
Subject: [PATCH 1/6] Add WAL relief vent for replication slots

Adds a capability to limit the number of segments kept by replication
slots by a GUC variable.
---
 src/backend/access/transam/xlog.c             | 127 +++++++++++++++++++++-----
 src/backend/replication/slot.c                |  57 ++++++++++++
 src/backend/utils/misc/guc.c                  |  12 +++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/xlog.h                     |   1 +
 src/include/replication/slot.h                |   1 +
 6 files changed, 174 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2ab7d804f0..9988ef943c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -100,6 +100,7 @@ int            wal_level = WAL_LEVEL_MINIMAL;
 int            CommitDelay = 0;    /* precommit delay in microseconds */
 int            CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int            wal_retrieve_retry_interval = 5000;
+int            max_slot_wal_keep_size_mb = -1;
 
 #ifdef WAL_DEBUG
 bool        XLOG_DEBUG = false;
@@ -872,6 +873,7 @@ static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 static void LocalSetXLogInsertAllowed(void);
 static void CreateEndOfRecoveryRecord(void);
 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
+static XLogSegNo GetOldestKeepSegment(XLogRecPtr currpos, XLogRecPtr minSlotPtr);
 static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
 static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
 
@@ -9325,6 +9327,53 @@ CreateRestartPoint(int flags)
     return true;
 }
 
+/*
+ * Returns minimum segment number the next checkpoint must leave considering
+ * wal_keep_segments, replication slots and max_slot_wal_keep_size.
+ *
+ * currLSN is the current insert location
+ * minSlotLSN is the minimum restart_lsn of all active slots
+ */
+static XLogSegNo
+GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN)
+{
+    uint64        keepSegs = 0;
+    XLogSegNo    currSeg;
+    XLogSegNo    minSlotSeg;
+
+    XLByteToSeg(currLSN, currSeg, wal_segment_size);
+    XLByteToSeg(minSlotLSN, minSlotSeg, wal_segment_size);
+
+    /*
+     * Calculate keep segments by slots first. The second term of the
+     * condition is just a sanity check.
+     */
+    if (minSlotLSN != InvalidXLogRecPtr && minSlotSeg <= currSeg)
+        keepSegs = currSeg - minSlotSeg;
+
+    /* Cap keepSegs by max_slot_wal_keep_size */
+    if (max_slot_wal_keep_size_mb >= 0)
+    {
+        uint64 limitSegs;
+
+        limitSegs = ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
+
+        /* Apply max_slot_wal_keep_size to keepSegs */
+        if (limitSegs < keepSegs)
+            keepSegs = limitSegs;
+    }
+
+    /* but, keep at least wal_keep_segments segments if any */
+    if (wal_keep_segments > 0 && keepSegs < wal_keep_segments)
+        keepSegs = wal_keep_segments;
+
+    /* avoid underflow, don't go below 1 */
+    if (currSeg <= keepSegs)
+        return 1;
+
+    return currSeg - keepSegs;
+}
+
 /*
  * Retreat *logSegNo to the last segment that we need to retain because of
  * either wal_keep_segments or replication slots.
@@ -9336,38 +9385,66 @@ CreateRestartPoint(int flags)
 static void
 KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
 {
-    XLogSegNo    segno;
-    XLogRecPtr    keep;
+    XLogRecPtr    slotminptr = InvalidXLogRecPtr;
+    XLogSegNo    minSegNo;
+    XLogSegNo    slotSegNo;
 
-    XLByteToSeg(recptr, segno, wal_segment_size);
-    keep = XLogGetReplicationSlotMinimumLSN();
+    if (max_replication_slots > 0)
+        slotminptr = XLogGetReplicationSlotMinimumLSN();
 
-    /* compute limit for wal_keep_segments first */
-    if (wal_keep_segments > 0)
+    /*
+     * We should keep certain number of WAL segments after this checkpoint.
+     */
+    minSegNo = GetOldestKeepSegment(recptr, slotminptr);
+
+    /*
+     * warn if the checkpoint flushes the segments required by replication
+     * slots.
+     */
+    if (!XLogRecPtrIsInvalid(slotminptr))
     {
-        /* avoid underflow, don't go below 1 */
-        if (segno <= wal_keep_segments)
-            segno = 1;
+        static XLogSegNo prev_lost_segs = 0;    /* avoid duplicate messages */
+
+        XLByteToSeg(slotminptr, slotSegNo, wal_segment_size);
+
+        if (slotSegNo < minSegNo)
+        {
+            XLogSegNo lost_segs = minSegNo - slotSegNo;
+            if (prev_lost_segs != lost_segs)
+            {
+                /* We have lost a new segment, warn it.*/
+                XLogRecPtr minlsn;
+                char *slot_names;
+                int nslots;
+
+                XLogSegNoOffsetToRecPtr(minSegNo, 0, wal_segment_size, minlsn);
+                slot_names =
+                    ReplicationSlotsEnumerateBehinds(minlsn, ", ", &nslots);
+
+                /*
+                 * Some of the affected slots could have just been removed.
+                 * We don't need show anything here if no affected slot
+                 * remains.
+                 */
+                if (slot_names)
+                {
+                    ereport(WARNING,
+                            (errmsg ("some replication slots have lost required WAL segments"),
+                             errdetail_plural(
+                                 "Slot %s lost %ld segment(s).",
+                                 "Slots %s lost at most %ld segment(s).",
+                                 nslots, slot_names, lost_segs)));
+                }
+            }
+            prev_lost_segs = lost_segs;
+        }
         else
-            segno = segno - wal_keep_segments;
-    }
-
-    /* then check whether slots limit removal further */
-    if (max_replication_slots > 0 && keep != InvalidXLogRecPtr)
-    {
-        XLogSegNo    slotSegNo;
-
-        XLByteToSeg(keep, slotSegNo, wal_segment_size);
-
-        if (slotSegNo <= 0)
-            segno = 1;
-        else if (slotSegNo < segno)
-            segno = slotSegNo;
+            prev_lost_segs = 0;
     }
 
     /* don't delete WAL segments newer than the calculated segment */
-    if (segno < *logSegNo)
-        *logSegNo = segno;
+    if (minSegNo < *logSegNo)
+        *logSegNo = minSegNo;
 }
 
 /*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 33b23b6b6d..1a705ca0d3 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1064,6 +1064,63 @@ ReplicationSlotReserveWal(void)
     }
 }
 
+/*
+ * Returns the list of replication slots restart_lsn of whch are behind
+ * specified LSN. Returs palloc'ed character array stuffed with slot names
+ * delimited by the givein separator.  Returns NULL if no slot matches.  If
+ * pnslots is given, the number of the returned slots is returned there.
+ */
+char *
+ReplicationSlotsEnumerateBehinds(XLogRecPtr target, char *separator, int *pnslots)
+{
+    static StringInfoData retstr;
+    static bool retstr_initialized = false;
+    bool insert_separator = false;
+    int i;
+    int nslots = 0;
+
+    Assert (separator);
+    if (max_replication_slots <= 0)
+        return NULL;
+
+    if (!retstr_initialized)
+    {
+        initStringInfo(&retstr);
+        retstr_initialized = true;
+    }
+    else
+        resetStringInfo(&retstr);
+
+    /* construct name list */
+    LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+    for (i = 0 ; i < max_replication_slots ; i++)
+    {
+        ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
+
+        if (s->in_use && s->data.restart_lsn < target)
+        {
+            if (insert_separator)
+                appendStringInfoString(&retstr, separator);
+
+            /*
+             * slot names consist only with lower-case letters. we don't
+             * bother quoting.
+             */
+            appendStringInfoString(&retstr, NameStr(s->data.name));
+            insert_separator = true;
+            nslots++;
+        }
+    }
+    LWLockRelease(ReplicationSlotControlLock);
+
+    /* return the number of slots in the list if requested */
+    if (pnslots)
+        *pnslots = nslots;
+
+    /* return NULL if the result is an empty string */
+    return retstr.data[0] ? retstr.data : NULL;
+}
+
 /*
  * Flush all replication slots to disk.
  *
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 98d75be292..789fabca66 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2616,6 +2616,18 @@ static struct config_int ConfigureNamesInt[] =
         NULL, NULL, NULL
     },
 
+    {
+        {"max_slot_wal_keep_size", PGC_SIGHUP, REPLICATION_SENDING,
+            gettext_noop("Sets the maximum size of extra WALs kept by replication slots."),
+         NULL,
+         GUC_UNIT_MB
+        },
+        &max_slot_wal_keep_size_mb,
+        -1, -1,
+        MAX_KILOBYTES, /* XXX: This is in megabytes, like max/min_wal_size */
+        NULL, NULL, NULL
+    },
+
     {
         {"wal_sender_timeout", PGC_USERSET, REPLICATION_SENDING,
             gettext_noop("Sets the maximum time to wait for WAL replication."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a21865a77f..3768e8d08a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -278,6 +278,7 @@
 #max_wal_senders = 10        # max number of walsender processes
                 # (change requires restart)
 #wal_keep_segments = 0        # in logfile segments; 0 disables
+#max_slot_wal_keep_size = -1    # measured in bytes; -1 disables
 #wal_sender_timeout = 60s    # in milliseconds; 0 disables
 
 #max_replication_slots = 10    # max number of replication slots
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index f90a6a9139..b2eb30b779 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -108,6 +108,7 @@ extern int    wal_segment_size;
 extern int    min_wal_size_mb;
 extern int    max_wal_size_mb;
 extern int    wal_keep_segments;
+extern int    max_slot_wal_keep_size_mb;
 extern int    XLOGbuffers;
 extern int    XLogArchiveTimeout;
 extern int    wal_retrieve_retry_interval;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8f1d66bae..9c3635dc0e 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -199,6 +199,7 @@ extern void ReplicationSlotsComputeRequiredLSN(void);
 extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
+extern char *ReplicationSlotsEnumerateBehinds(XLogRecPtr target, char *separator, int *pnslots);
 
 extern void StartupReplicationSlots(void);
 extern void CheckPointReplicationSlots(void);
-- 
2.16.3

From 4be4ce4c671499c373ac5f8318f432db182eb8f4 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 21 Dec 2017 21:23:25 +0900
Subject: [PATCH 2/6] Add monitoring aid for max_slot_wal_keep_size.

Adds two columns "status" and "remain" in pg_replication_slot.
Setting max_slot_wal_keep_size, long-disconnected slots may lose sync.
The two columns show whether the slot is reconnectable or not, or
about to lose reserving WAL segments, and the remaining bytes of WAL
that can be advance until the slot loses reserving WAL records.
---
 contrib/test_decoding/expected/ddl.out |   4 +-
 contrib/test_decoding/sql/ddl.sql      |   2 +
 src/backend/access/transam/xlog.c      | 150 +++++++++++++++++++++++++++++++--
 src/backend/catalog/system_views.sql   |   4 +-
 src/backend/replication/slotfuncs.c    |  16 +++-
 src/include/access/xlog.h              |   1 +
 src/include/catalog/pg_proc.dat        |   6 +-
 src/test/regress/expected/rules.out    |   6 +-
 8 files changed, 172 insertions(+), 17 deletions(-)

diff --git a/contrib/test_decoding/expected/ddl.out b/contrib/test_decoding/expected/ddl.out
index b7c76469fc..c5f52d6ee8 100644
--- a/contrib/test_decoding/expected/ddl.out
+++ b/contrib/test_decoding/expected/ddl.out
@@ -705,8 +705,8 @@ SELECT pg_drop_replication_slot('regression_slot');
 (1 row)
 
 /* check that the slot is gone */
+\x
 SELECT * FROM pg_replication_slots;
- slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin |
restart_lsn| confirmed_flush_lsn 
 

------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
 (0 rows)
 
+\x
diff --git a/contrib/test_decoding/sql/ddl.sql b/contrib/test_decoding/sql/ddl.sql
index c4b10a4cf9..5040d5e85e 100644
--- a/contrib/test_decoding/sql/ddl.sql
+++ b/contrib/test_decoding/sql/ddl.sql
@@ -374,4 +374,6 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
 SELECT pg_drop_replication_slot('regression_slot');
 
 /* check that the slot is gone */
+\x
 SELECT * FROM pg_replication_slots;
+\x
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9988ef943c..aaafa6b74f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -873,7 +873,9 @@ static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 static void LocalSetXLogInsertAllowed(void);
 static void CreateEndOfRecoveryRecord(void);
 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
-static XLogSegNo GetOldestKeepSegment(XLogRecPtr currpos, XLogRecPtr minSlotPtr);
+static XLogSegNo GetOldestXLogFileSegNo(void);
+static XLogSegNo GetOldestKeepSegment(XLogRecPtr currpos, XLogRecPtr minSlotPtr,
+                       XLogRecPtr targetLSN, uint64 *restBytes);
 static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
 static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
 
@@ -6660,6 +6662,12 @@ StartupXLOG(void)
      */
     StartupReplicationOrigin();
 
+    /*
+     * Initialize lastRemovedSegNo looking pg_wal directory. The minimum
+     * segment number is 1 so no wrap-around can happen.
+     */
+    XLogCtl->lastRemovedSegNo = GetOldestXLogFileSegNo() - 1;
+
     /*
      * Initialize unlogged LSN. On a clean shutdown, it's restored from the
      * control file. On recovery, all unlogged relations are blown away, so
@@ -9327,19 +9335,115 @@ CreateRestartPoint(int flags)
     return true;
 }
 
+
+/*
+ * Finds the segment number of the oldest file in XLOG directory.
+ *
+ * This function is intended to be used for initialization of
+ * XLogCtl->lastRemovedSegNo.
+ */
+static XLogSegNo
+GetOldestXLogFileSegNo(void)
+{
+    DIR        *xldir;
+    struct dirent *xlde;
+    XLogSegNo segno = 0;
+
+    xldir = AllocateDir(XLOGDIR);
+    while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
+    {
+        TimeLineID tli;
+        XLogSegNo fsegno;
+
+        /* Ignore files that are not XLOG segments */
+        if (!IsXLogFileName(xlde->d_name) &&
+            !IsPartialXLogFileName(xlde->d_name))
+            continue;
+
+        XLogFromFileName(xlde->d_name, &tli, &fsegno, wal_segment_size);
+
+        /*
+         * get minimum segment ignoring timeline ID.  Since RemoveOldXlog
+         * works ignoring timeline ID, this function works the same way.
+         */
+        if (segno == 0 || fsegno < segno)
+            segno = fsegno;
+    }
+
+    FreeDir(xldir);
+
+    return segno;
+}
+
+/*
+ * Returns availability status of the record at given targetLSN
+ *
+ * Returns three kinds of value.
+ * "streaming" when the WAL record at targetLSN is available.
+ * "keeping" when still available but about to be removed by the next
+ * checkpoint.
+ * "lost" when the WAL record at targetLSN is already removed.
+ *
+ * If restBytes is not NULL, sets the remaining LSN bytes to advance until the
+ * segment that contains targetLSN will be removed.
+ */
+char *
+GetLsnAvailability(XLogRecPtr targetLSN, uint64 *restBytes)
+{
+    XLogRecPtr currpos;
+    XLogRecPtr slotPtr;
+    XLogSegNo targetSeg;
+    XLogSegNo tailSeg;
+    XLogSegNo oldestSeg;
+
+    Assert(!XLogRecPtrIsInvalid(targetLSN));
+    Assert(restBytes);
+
+    currpos = GetXLogWriteRecPtr();
+
+    SpinLockAcquire(&XLogCtl->info_lck);
+    oldestSeg = XLogCtl->lastRemovedSegNo;
+    SpinLockRelease(&XLogCtl->info_lck);
+
+    /* oldest segment is just after the last removed segment */
+    oldestSeg++;
+
+    XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
+
+    slotPtr = XLogGetReplicationSlotMinimumLSN();
+    tailSeg = GetOldestKeepSegment(currpos, slotPtr, targetLSN, restBytes);
+
+    /* targetSeg is being reserved by slots */
+    if (tailSeg <= targetSeg)
+        return "streaming";
+
+    /* targetSeg is not reserved but still available */
+    if (oldestSeg <= targetSeg)
+        return "keeping";
+
+    /* targetSeg has gone */
+    return    "lost";
+}
+
 /*
  * Returns minimum segment number the next checkpoint must leave considering
  * wal_keep_segments, replication slots and max_slot_wal_keep_size.
  *
  * currLSN is the current insert location
  * minSlotLSN is the minimum restart_lsn of all active slots
+ * targetLSN is used when restBytes is not NULL.
+ *
+ * If restBytes is not NULL, sets the remaining LSN bytes to advance until the
+ * segment that contains targetLSN will be removed.
  */
 static XLogSegNo
-GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN)
+GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN,
+                     XLogRecPtr targetLSN, uint64 *restBytes)
 {
     uint64        keepSegs = 0;
     XLogSegNo    currSeg;
     XLogSegNo    minSlotSeg;
+    uint64        limitSegs = 0;
 
     XLByteToSeg(currLSN, currSeg, wal_segment_size);
     XLByteToSeg(minSlotLSN, minSlotSeg, wal_segment_size);
@@ -9354,8 +9458,6 @@ GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN)
     /* Cap keepSegs by max_slot_wal_keep_size */
     if (max_slot_wal_keep_size_mb >= 0)
     {
-        uint64 limitSegs;
-
         limitSegs = ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
 
         /* Apply max_slot_wal_keep_size to keepSegs */
@@ -9363,9 +9465,40 @@ GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN)
             keepSegs = limitSegs;
     }
 
-    /* but, keep at least wal_keep_segments segments if any */
-    if (wal_keep_segments > 0 && keepSegs < wal_keep_segments)
-        keepSegs = wal_keep_segments;
+    if (wal_keep_segments > 0)
+    {
+        /* but, keep at least wal_keep_segments segments if any */
+        if (keepSegs < wal_keep_segments)
+            keepSegs = wal_keep_segments;
+
+        /* also, limitSegs should be raised if wal_keep_segments is larger */
+        if (limitSegs < wal_keep_segments)
+            limitSegs = wal_keep_segments;
+    }
+
+    /*
+     * If requested, return remaining LSN bytes to advance until the slot
+     * gives up reserving WAL records.
+     */
+    if (restBytes)
+    {
+        uint64 fragbytes;
+        XLogSegNo targetSeg;
+
+        *restBytes = 0;
+
+        XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
+        if (max_slot_wal_keep_size_mb >= 0 && currSeg <= targetSeg + limitSegs)
+        {
+            /*
+             * This slot still has all required segments. Calculate how many
+             * LSN bytes the slot has until it loses targetLSN.
+             */
+            fragbytes = wal_segment_size - (currLSN % wal_segment_size);
+            XLogSegNoOffsetToRecPtr(targetSeg + limitSegs - currSeg, fragbytes,
+                                    wal_segment_size, *restBytes);
+        }
+    }
 
     /* avoid underflow, don't go below 1 */
     if (currSeg <= keepSegs)
@@ -9395,7 +9528,8 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
     /*
      * We should keep certain number of WAL segments after this checkpoint.
      */
-    minSegNo = GetOldestKeepSegment(recptr, slotminptr);
+    minSegNo =
+        GetOldestKeepSegment(recptr, slotminptr, InvalidXLogRecPtr, NULL);
 
     /*
      * warn if the checkpoint flushes the segments required by replication
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f4d9e9daf7..358df2c183 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -798,7 +798,9 @@ CREATE VIEW pg_replication_slots AS
             L.xmin,
             L.catalog_xmin,
             L.restart_lsn,
-            L.confirmed_flush_lsn
+            L.confirmed_flush_lsn,
+            L.wal_status,
+            L.remain
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 224dd920c8..cac66978ed 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -185,7 +185,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 Datum
 pg_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_GET_REPLICATION_SLOTS_COLS 11
+#define PG_GET_REPLICATION_SLOTS_COLS 13
     ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
     TupleDesc    tupdesc;
     Tuplestorestate *tupstore;
@@ -307,6 +307,20 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
         else
             nulls[i++] = true;
 
+        if (restart_lsn == InvalidXLogRecPtr)
+        {
+            values[i++] = CStringGetTextDatum("unknown");
+            values[i++] = LSNGetDatum(InvalidXLogRecPtr);
+        }
+        else
+        {
+            uint64    remaining_bytes;
+
+            values[i++] = CStringGetTextDatum(
+                GetLsnAvailability(restart_lsn, &remaining_bytes));
+            values[i++] = Int64GetDatum(remaining_bytes);
+        }
+
         tuplestore_putvalues(tupstore, tupdesc, values, nulls);
     }
     LWLockRelease(ReplicationSlotControlLock);
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index b2eb30b779..b0cdba6d7a 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -302,6 +302,7 @@ extern void ShutdownXLOG(int code, Datum arg);
 extern void InitXLOGAccess(void);
 extern void CreateCheckPoint(int flags);
 extern bool CreateRestartPoint(int flags);
+extern char *GetLsnAvailability(XLogRecPtr targetLSN, uint64 *restBytes);
 extern void XLogPutNextOid(Oid nextOid);
 extern XLogRecPtr XLogRestorePoint(const char *rpName);
 extern void UpdateFullPageWrites(void);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3ecc2e12c3..f7e6d18e35 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9665,9 +9665,9 @@
   proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', prorettype => 'record',
   proargtypes => '',
-  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames =>
'{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn}',
+  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames =>
'{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,remain}',
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index e384cd2279..956c3c9525 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1451,8 +1451,10 @@ pg_replication_slots| SELECT l.slot_name,
     l.xmin,
     l.catalog_xmin,
     l.restart_lsn,
-    l.confirmed_flush_lsn
-   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin,
catalog_xmin,restart_lsn, confirmed_flush_lsn)
 
+    l.confirmed_flush_lsn,
+    l.wal_status,
+    l.remain
+   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin,
catalog_xmin,restart_lsn, confirmed_flush_lsn, wal_status, remain)
 
      LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
 pg_roles| SELECT pg_authid.rolname,
     pg_authid.rolsuper,
-- 
2.16.3

From 62b60031f7aed53bd176d2296a1a6d36bf7017c9 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Wed, 19 Dec 2018 12:43:57 +0900
Subject: [PATCH 3/6] Add primary_slot_name to init_from_backup in TAP test.

It is convenient that priary_slot_name can be specified on taking a
base backup. This adds a new parameter of the name to the perl
function.
---
 src/test/perl/PostgresNode.pm | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 8a2c6fc122..daca2e0085 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -672,11 +672,11 @@ sub init_from_backup
     chmod(0700, $data_path);
 
     # Base configuration for this node
-    $self->append_conf(
-        'postgresql.conf',
-        qq(
-port = $port
-));
+    $self->append_conf('postgresql.conf', qq(port = $port));
+    $self->append_conf('postgresql.conf',
+                       qq(primary_slot_name = $params{primary_slot_name}))
+      if (defined $params{primary_slot_name});
+
     $self->enable_streaming($root_node) if $params{has_streaming};
     $self->enable_restoring($root_node) if $params{has_restoring};
     return;
-- 
2.16.3

From 254ae4a490f951b235550d7f9c0555cbbbee108e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 21 Dec 2017 17:33:53 +0900
Subject: [PATCH 4/6] TAP test for the slot limit feature

---
 src/test/recovery/t/016_replslot_limit.pl | 185 ++++++++++++++++++++++++++++++
 1 file changed, 185 insertions(+)
 create mode 100644 src/test/recovery/t/016_replslot_limit.pl

diff --git a/src/test/recovery/t/016_replslot_limit.pl b/src/test/recovery/t/016_replslot_limit.pl
new file mode 100644
index 0000000000..8d1d3a2275
--- /dev/null
+++ b/src/test/recovery/t/016_replslot_limit.pl
@@ -0,0 +1,185 @@
+# Test for replication slot limit
+# Ensure that max_slot_wal_keep_size limits the number of WAL files to
+# be kept by replication slot.
+
+use strict;
+use warnings;
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+use Time::HiRes qw(usleep);
+
+$ENV{PGDATABASE} = 'postgres';
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1, extra => ['--wal-segsize=1']);
+$node_master->append_conf('postgresql.conf', qq(
+min_wal_size = 2MB
+max_wal_size = 3MB
+));
+$node_master->start;
+$node_master->safe_psql('postgres', "SELECT pg_create_physical_replication_slot('rep1')");
+
+# The slot state should be known before the first connection
+my $result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, remain FROM pg_replication_slots
WHEREslot_name = 'rep1'");
 
+is($result, "|unknown|0", 'non-reserved slot shows unknown');
+
+
+# Take backup
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create a standby linking to it using a replication slot
+my $node_standby = get_new_node('standby_1');
+$node_standby->init_from_backup($node_master, $backup_name, has_streaming => 1, primary_slot_name => 'rep1');
+
+$node_standby->start;
+
+# Wait until standby has replayed enough data on the standby
+my $start_lsn = $node_master->lsn('write');
+$node_master->wait_for_catchup($node_standby, 'replay', $start_lsn);
+
+# Stop standby
+$node_standby->stop;
+
+
+# Preparation done, currently the slot must be streaming state.
+$result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, remain FROM pg_replication_slots WHERE
slot_name= 'rep1'");
 
+is($result, "$start_lsn|streaming|0", 'check initial state of standby');
+
+# Advance WAL by five segments (= 5MB) on master
+advance_wal($node_master, 5);
+$node_master->safe_psql('postgres', "CHECKPOINT;");
+
+# The slot is unconditionally "safe" with the default setting.
+$result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, remain FROM pg_replication_slots WHERE
slot_name= 'rep1'");
 
+is($result, "$start_lsn|streaming|0", 'check that slot is keeping all segments');
+
+# The stanby can connect to master
+$node_standby->start;
+
+$start_lsn = $node_master->lsn('write');
+$node_master->wait_for_catchup($node_standby, 'replay', $start_lsn);
+
+$node_standby->stop;
+
+# Set max_slot_wal_keep_size on master
+my $max_slot_wal_keep_size_mb = 3;
+$node_master->append_conf('postgresql.conf', qq(
+max_slot_wal_keep_size = ${max_slot_wal_keep_size_mb}MB
+));
+$node_master->reload;
+
+# The slot is in safe state. The remaing bytes should be almost just
+# (max_slot_wal_keep_size + 1) times as large as the segment size.
+$result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, pg_size_pretty(remain) as remain FROM
pg_replication_slotsWHERE slot_name = 'rep1'");
 
+is($result, "$start_lsn|streaming|4096 kB", 'check that remaining bytes is calculated');
+
+# Advance WAL again then checkpoint
+advance_wal($node_master, 2);
+$node_master->safe_psql('postgres', "CHECKPOINT;");
+
+# The slot is still working.
+$result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, pg_size_pretty(remain) as remain FROM
pg_replication_slotsWHERE slot_name = 'rep1'");
 
+is($result, "$start_lsn|streaming|2048 kB", 'remaining byte should be reduced by 2MB');
+
+
+# wal_keep_segments can override
+$result = $node_master->safe_psql('postgres', "ALTER SYSTEM SET wal_keep_segments to 6; SELECT pg_reload_conf();");
+$result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, pg_size_pretty(remain) as remain FROM
pg_replication_slotsWHERE slot_name = 'rep1'");
 
+is($result, "$start_lsn|streaming|5120 kB", 'check that wal_keep_segments overrides');
+
+# restore wal_keep_segments (no test)
+$result = $node_master->safe_psql('postgres', "ALTER SYSTEM SET wal_keep_segments to 0; SELECT pg_reload_conf();");
+
+# Advance WAL again without checkpoint
+advance_wal($node_master, 2);
+
+# Slot gets to 'keeping' state
+$result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, pg_size_pretty(remain) as remain FROM
pg_replication_slotsWHERE slot_name = 'rep1'");
 
+is($result, "$start_lsn|keeping|0 bytes", 'check that some segments are about to be removed');
+
+# The stanby still can connect to master
+$node_standby->start;
+
+$start_lsn = $node_master->lsn('write');
+$node_master->wait_for_catchup($node_standby, 'replay', $start_lsn);
+
+$node_standby->stop;
+
+ok(!find_in_log($node_standby,
+                "requested WAL segment [0-9A-F]+ has already been removed"),
+   'check that no replication failure is caused by insecure state');
+
+# Advance WAL again, the slot loses some segments.
+my $logstart = get_log_size($node_master);
+advance_wal($node_master, 5);
+$node_master->safe_psql('postgres', "CHECKPOINT;");
+
+# WARNING should be issued
+ok(find_in_log($node_master,
+               "some replication slots have lost required WAL segments\n".
+               ".*Slot rep1 lost 2 segment\\(s\\)\\.",
+               $logstart),
+   'check that warning is correctly logged');
+
+# This slot should be broken
+$result = $node_master->safe_psql('postgres', "SELECT restart_lsn, wal_status, pg_size_pretty(remain) as remain FROM
pg_replication_slotsWHERE slot_name = 'rep1'");
 
+is($result, "$start_lsn|lost|0 bytes", 'check that overflown segments have been removed');
+
+# The stanby no longer can connect to the master
+$logstart = get_log_size($node_standby);
+$node_standby->start;
+
+my $failed = 0;
+for (my $i = 0 ; $i < 10000 ; $i++)
+{
+    if (find_in_log($node_standby,
+                    "requested WAL segment [0-9A-F]+ has already been removed",
+                    $logstart))
+    {
+        $failed = 1;
+        last;
+    }
+    usleep(100_000);
+}
+ok($failed, 'check that replication has been broken');
+
+$node_standby->stop;
+
+#####################################
+# Advance WAL of $node by $n segments
+sub advance_wal
+{
+    my ($node, $n) = @_;
+
+    # Advance by $n segments (= (16 * $n) MB) on master
+    for (my $i = 0 ; $i < $n ; $i++)
+    {
+        $node->safe_psql('postgres', "CREATE TABLE t (); DROP TABLE t; SELECT pg_switch_wal();");
+    }
+}
+
+# return the size of logfile of $node in bytes
+sub get_log_size
+{
+    my ($node) = @_;
+
+    return (stat $node->logfile)[7];
+}
+
+# find $pat in logfile of $node after $off-th byte
+sub find_in_log
+{
+    my ($node, $pat, $off) = @_;
+
+    $off = 0 unless defined $off;
+    my $log = TestLib::slurp_file($node->logfile);
+    return 0 if (length($log) <= $off);
+
+    $log = substr($log, $off);
+
+    return $log =~ m/$pat/;
+}
-- 
2.16.3

From ad8ae6bb324f3a7c6ac380b0976b66fda08154f5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 11 Jan 2018 15:00:32 +0900
Subject: [PATCH 5/6] Documentation for slot-limit feature

---
 doc/src/sgml/catalogs.sgml          | 28 ++++++++++++++++++++++++++++
 doc/src/sgml/config.sgml            | 23 +++++++++++++++++++++++
 doc/src/sgml/high-availability.sgml |  8 +++++---
 3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index af4d0625ea..7ec8764ce5 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9825,6 +9825,34 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
       </entry>
      </row>
 
+     <row>
+      <entry><structfield>wal_status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+
+      <entry>Availability of WAL records claimed by this
+      slot. <literal>streaming</literal>, <literal>keeping</literal>,
+      <literal>lost</literal>
+      or <literal>unknown</literal>. <literal>streaming</literal> means that
+      the claimed records are available. <literal>keeping</literal> means that
+      some of them are to be removed by the next checkpoint.
+      <literal>lost</literal> means that some of them are no longer
+      available. The last two states are seen only when
+      <xref linkend="guc-max-slot-wal-keep-size"/> is
+      non-negative. If <structfield>restart_lsn</structfield> is NULL, this
+      field is <literal>unknown</literal>.
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>remain</structfield></entry>
+      <entry><type>bigint</type></entry>
+      <entry></entry>
+      <entry>The amount in bytes that WAL location (LSN) can advance until
+        this slot may lose required WAL records.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b6f5822b84..7177c6122a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3537,6 +3537,29 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
        </listitem>
       </varlistentry>
 
+      <varlistentry id="guc-max-slot-wal-keep-size" xreflabel="max_slot_wal_keep_size">
+       <term><varname>max_slot_wal_keep_size</varname> (<type>integer</type>)
+       <indexterm>
+        <primary><varname>max_slot_wal_keep_size</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+       <para>
+        Specify the maximum size of WAL files
+        that <link linkend="streaming-replication-slots">replication
+        slots</link> are allowed to retain in the <filename>pg_wal</filename>
+        directory at checkpoint time.
+        If <varname>max_slot_wal_keep_size</varname> is -1 (the default),
+        replication slots retain unlimited size of WAL files.  If restart_lsn
+        of a replication slot gets behind more than that bytes from the
+        current LSN, the standby using the slot may no longer be able to
+        reconnect due to removal of required WAL records. You can see the WAL
+        availability of replication slots
+        in <link linkend="view-pg-replication-slots">pg_replication_slots</link>.
+       </para>
+       </listitem>
+      </varlistentry>
+
      <varlistentry id="guc-wal-sender-timeout" xreflabel="wal_sender_timeout">
       <term><varname>wal_sender_timeout</varname> (<type>integer</type>)
       <indexterm>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index bbab7395a2..79901c5f06 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -925,9 +925,11 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     <xref linkend="guc-archive-command"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
-    known to be needed.  An advantage of these methods is that they bound
-    the space requirement for <literal>pg_wal</literal>; there is currently no way
-    to do this using replication slots.
+    known to be needed.  On the other hand, replication slots can retain so
+    many WAL segments that they fill up the space allocated
+    for <literal>pg_wal</literal>;
+    <xref linkend="guc-max-slot-wal-keep-size"/> limits the size of WAL files
+    retained by replication slots.
    </para>
    <para>
     Similarly, <xref linkend="guc-hot-standby-feedback"/>
-- 
2.16.3

From 96991c3d5fb4d5dc2df6e98fd725eb85189e739e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Fri, 26 Oct 2018 10:07:05 +0900
Subject: [PATCH 6/6] Check removal of in-reading segment file.

Checkpoint can remove or recycle a segment file while it is being read
by ReadRecord during logical decoding. This patch checks for the case
and error out immedaitely.  Reading recycled file is basically safe
and inconsistency caused by overwrites as new segment will be caught
by page/record validation. So this is only for keeping consistency
with the wal_status shown in pg_replication_slots.
---
 src/backend/access/transam/xlogreader.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 69b6226f8f..cc24afd30d 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -26,6 +26,7 @@
 #include "replication/origin.h"
 
 #ifndef FRONTEND
+#include "access/xlog.h"
 #include "utils/memutils.h"
 #endif
 
@@ -224,7 +225,9 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
     uint32        pageHeaderSize;
     bool        gotheader;
     int            readOff;
-
+#ifndef FRONTEND    
+    XLogSegNo    targetSegNo;
+#endif
     /*
      * randAccess indicates whether to verify the previous-record pointer of
      * the record we're reading.  We only do this if we're reading
@@ -270,6 +273,21 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
     targetPagePtr = RecPtr - (RecPtr % XLOG_BLCKSZ);
     targetRecOff = RecPtr % XLOG_BLCKSZ;
 
+#ifndef FRONTEND
+    /*
+     * checkpoint can remove the segment currently looking for.  make sure the
+     * current segment still exists. we check this once per page. This cannot
+     * happen on frontend.
+     */
+    XLByteToSeg(targetPagePtr, targetSegNo, state->wal_segment_size);
+    if (targetSegNo <= XLogGetLastRemovedSegno())
+    {
+        report_invalid_record(state,
+                              "WAL segment for LSN %X/%X has been removed",
+                              (uint32)(RecPtr >> 32), (uint32) RecPtr);
+        goto err;
+    }
+#endif
     /*
      * Read the page containing the record into state->readBuf. Request enough
      * byte to cover the whole record header, or at least the part of it that
-- 
2.16.3


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well
Next
From: Andres Freund
Date:
Subject: Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well