Thread: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

PG Bug reporting form

Date:

13 July 2021, 09:15:17

The following bug has been logged on the website:

Bug reference: 17103
Logged by: Marcin Krupowicz
Email address: mk@071.ovh
PostgreSQL version: 13.3
Operating system: CentOS 7.6
Description:

Hi,

After the replication fell behind and the lag exceeded
max_slot_wal_keep_size, WAL on master were not removed. It seems that
Postgres tried to maintain max_slot_wal_keep_size worth of segments. Please
find the details below (sligthly redacted version of what I wrote here:

https://stackoverflow.com/questions/68314222/replication-lag-exceeding-max-slot-wal-keep-size-wal-segments-not-removed).

-- Summary --
We are using max_slot_wal_keep_size from Postgresql 13 to prevent master
from being killed by a lagging replication. It seems, that in our case, WAL
storage wasn't freed up after exceeding this parameter which resulted in a
replication failure. WAL which, as I believe, should have been freed up did
not seem to be needed by any other transaction at a time.

-- Configuration --
master & one replica - streaming replication using a slot
~700GB available for pg_wal
max_slot_wal_keep_size = 600GB
min_wal_size = 20GB
max_wal_size = 40GB
default checkpoint_timeout = 5 minutes (no problem with checkpoints)
archiving is on and is catching up well

-- What happened --
Under heavy load (large COPY/INSERT transactions, loading hundreds of GB of
data), the replication started falling behind. Available space on pg_wal was
being reduced in the same rate as safe_slot
pg_replication_slot.safe_wal_size - as expected. At some point safe_wal_size
went negative and streaming stopped working. It wasn't a problem, because
replica started recovery from WAL archive. I expected that once the slot is
lost, WALs will be removed up to max_wal_size. This did not happen though.
It seems that Postgres tried to maintain something close to
max_slot_wal_keep_size (600GB) available, in case replica starts catching up
again. Over the time, there was no single transaction which would require
this much WAL to be kept. archiving wasn't behind either.

Amount of free space on pg_wal was more or less 70GB for most of the time,
however at some point, during heavy autovacuuming, it dipped to 0 :( This is
when PG crashed and (auto-recovered soon after). After getting back up,
there was 11GB left on pg_wal and no transaction running, no loading. This
lasted for hours. During this time replica finally caught up from the
archive and restored the replication with no delay. None of the WALs were
removed. I manually run checkpoint but it did not clear any WALs. I finally
restarted Postgresql and during the restarting pg_wal were finally
cleared.

Again - why PG did not clear WAL? WALs, even more clearly, were not needed
by any process.

Many thanks,
-- Marcin

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Kyotaro Horiguchi

Date:

14 July 2021, 02:12:14

At Tue, 13 Jul 2021 09:15:17 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in 
> We are using max_slot_wal_keep_size from Postgresql 13 to prevent master
> from being killed by a lagging replication. It seems, that in our case, WAL
> storage wasn't freed up after exceeding this parameter which resulted in a
> replication failure. WAL which, as I believe, should have been freed up did
> not seem to be needed by any other transaction at a time.

Yeah, the max_slot_wal_keep_size is the maximum WAL size that
replication slots are guaranteed to be able to keep files up to.  It
is not the size that replication slot are guaranteed not to keep WAL
files beyond it.  Addition to that, WAL removal happens only at the
ending of a checkpoint so WAL files can grow up to
max_slot_wal_keep_size plus checkpoint distance assuming an even load.

> -- Configuration --
> master & one replica - streaming replication using a slot
> ~700GB available for pg_wal
> max_slot_wal_keep_size = 600GB
> min_wal_size = 20GB
> max_wal_size = 40GB
> default checkpoint_timeout = 5 minutes (no problem with checkpoints)
> archiving is on and is catching up well

Assuming an even load (or WAL speed) and 0.5 for
checkpoint_completion_target, 40GB of max_wal_size causes checkpoints
every 27GB (1706 segments) (*1) at longest (in the case where xlog
checkpoint fires before timeout checkpoint).

Thus with 600GB of max_slot_wal_keep_size, the maximum size of WAL
files can reach 627GB, which size can even be exceeded if a sudden
high-load is given.

[1] checkpoint distance = max_wal_size / (1.0 + checkpoint_completion_target)

> -- What happened --
> Under heavy load (large COPY/INSERT transactions, loading hundreds of GB of
> data), the replication started falling behind. Available space on pg_wal was
> being reduced in the same rate as safe_slot
> pg_replication_slot.safe_wal_size - as expected. At some point safe_wal_size
> went negative and streaming stopped working. It wasn't a problem, because
> replica started recovery from WAL archive. I expected that once the slot is
> lost, WALs will be removed up to max_wal_size. This did not happen though.
> It seems that Postgres tried to maintain something close to
> max_slot_wal_keep_size (600GB) available, in case replica starts catching up
> again. Over the time, there was no single transaction which would require
> this much WAL to be kept. archiving wasn't behind either.

Useless WAL files will be removd after a checkpoint runs.

> Amount of free space on pg_wal was more or less 70GB for most of the time,
> however at some point, during heavy autovacuuming, it dipped to 0 :( This is
> when PG crashed and (auto-recovered soon after). After getting back up,
> there was 11GB left on pg_wal and no transaction running, no loading. This
> lasted for hours. During this time replica finally caught up from the
> archive and restored the replication with no delay. None of the WALs were
> removed. I manually run checkpoint but it did not clear any WALs. I finally
> restarted Postgresql and during the restarting pg_wal were finally
> cleared.
>
> Again - why PG did not clear WAL? WALs, even more clearly, were not needed
> by any process.

Maybe manual CHECKPINT work for you , however, you should reconsider
the setting assuming the above behavior to prevent a crash due to WAL
storage exhaustion.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Marcin Krupowicz

Date:

14 July 2021, 07:52:34

Hi,

> > We are using max_slot_wal_keep_size from Postgresql 13 to prevent master
> > from being killed by a lagging replication. It seems, that in our case, WAL
> > storage wasn't freed up after exceeding this parameter which resulted in a
> > replication failure. WAL which, as I believe, should have been freed up did
> > not seem to be needed by any other transaction at a time.
>
> Yeah, the max_slot_wal_keep_size is the maximum WAL size that
> replication slots are guaranteed to be able to keep files up to.  It
> is not the size that replication slot are guaranteed not to keep WAL
> files beyond it.  Addition to that, WAL removal happens only at the
> ending of a checkpoint so WAL files can grow up to
> max_slot_wal_keep_size plus checkpoint distance assuming an even load.

I understand, but the situation lasted for many hours, until my manual
reboot. checkpoint timeout is 5 minutes, there were many checkpoints
between the time when the slot got lost (exceeding
max_slow_wal_keep_size) and my manual reboot. During all that time
load was fairly even and the amount of WAL segments stored wasn't
changing much.


> > -- Configuration --
> > master & one replica - streaming replication using a slot
> > ~700GB available for pg_wal
> > max_slot_wal_keep_size = 600GB
> > min_wal_size = 20GB
> > max_wal_size = 40GB
> > default checkpoint_timeout = 5 minutes (no problem with checkpoints)
> > archiving is on and is catching up well
>
> Assuming an even load (or WAL speed) and 0.5 for
> checkpoint_completion_target, 40GB of max_wal_size causes checkpoints
> every 27GB (1706 segments) (*1) at longest (in the case where xlog
> checkpoint fires before timeout checkpoint).
>
> Thus with 600GB of max_slot_wal_keep_size, the maximum size of WAL
> files can reach 627GB, which size can even be exceeded if a sudden
> high-load is given.
>
> [1] checkpoint distance = max_wal_size / (1.0 + checkpoint_completion_target)

Fair point, I should change my settings slightly - but that's not the
issue here.

>
> > -- What happened --
> > Under heavy load (large COPY/INSERT transactions, loading hundreds of GB of
> > data), the replication started falling behind. Available space on pg_wal was
> > being reduced in the same rate as safe_slot
> > pg_replication_slot.safe_wal_size - as expected. At some point safe_wal_size
> > went negative and streaming stopped working. It wasn't a problem, because
> > replica started recovery from WAL archive. I expected that once the slot is
> > lost, WALs will be removed up to max_wal_size. This did not happen though.
> > It seems that Postgres tried to maintain something close to
> > max_slot_wal_keep_size (600GB) available, in case replica starts catching up
> > again. Over the time, there was no single transaction which would require
> > this much WAL to be kept. archiving wasn't behind either.
>
> Useless WAL files will be removd after a checkpoint runs.

That did not happen.

> > Amount of free space on pg_wal was more or less 70GB for most of the time,
> > however at some point, during heavy autovacuuming, it dipped to 0 :( This is
> > when PG crashed and (auto-recovered soon after). After getting back up,
> > there was 11GB left on pg_wal and no transaction running, no loading. This
> > lasted for hours. During this time replica finally caught up from the
> > archive and restored the replication with no delay. None of the WALs were
> > removed. I manually run checkpoint but it did not clear any WALs. I finally
> > restarted Postgresql and during the restarting pg_wal were finally
> > cleared.
> >
> > Again - why PG did not clear WAL? WALs, even more clearly, were not needed
> > by any process.
>
> Maybe manual CHECKPINT work for you , however, you should reconsider
> the setting assuming the above behavior to prevent a crash due to WAL
> storage exhaustion.

Sorry, I'm confused. I did run manual CHECKPOINT (even though there
were many, many non-manual checkpoints run before that) and WAL
segments were NOT cleared, until I restarted postgresql.

Thanks,
-- Marcin

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Jeff Janes

Date:

14 July 2021, 23:10:26

On Tue, Jul 13, 2021 at 10:12 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Tue, 13 Jul 2021 09:15:17 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in
> We are using max_slot_wal_keep_size from Postgresql 13 to prevent master
> from being killed by a lagging replication. It seems, that in our case, WAL
> storage wasn't freed up after exceeding this parameter which resulted in a
> replication failure. WAL which, as I believe, should have been freed up did
> not seem to be needed by any other transaction at a time.

Yeah, the max_slot_wal_keep_size is the maximum WAL size that
replication slots are guaranteed to be able to keep files up to. It
is not the size that replication slot are guaranteed not to keep WAL
files beyond it. Addition to that, WAL removal happens only at the
ending of a checkpoint so WAL files can grow up to
max_slot_wal_keep_size plus checkpoint distance assuming an even load.

> -- Configuration --
> master & one replica - streaming replication using a slot
> ~700GB available for pg_wal
> max_slot_wal_keep_size = 600GB
> min_wal_size = 20GB
> max_wal_size = 40GB
> default checkpoint_timeout = 5 minutes (no problem with checkpoints)
> archiving is on and is catching up well

Assuming an even load (or WAL speed) and 0.5 for
checkpoint_completion_target, 40GB of max_wal_size causes checkpoints
every 27GB (1706 segments) (*1) at longest (in the case where xlog
checkpoint fires before timeout checkpoint).

Thus with 600GB of max_slot_wal_keep_size, the maximum size of WAL
files can reach 627GB, which size can even be exceeded if a sudden
high-load is given.

[1] checkpoint distance = max_wal_size / (1.0 + checkpoint_completion_target)

> -- What happened --
> Under heavy load (large COPY/INSERT transactions, loading hundreds of GB of
> data), the replication started falling behind. Available space on pg_wal was
> being reduced in the same rate as safe_slot
> pg_replication_slot.safe_wal_size - as expected. At some point safe_wal_size
> went negative and streaming stopped working. It wasn't a problem, because
> replica started recovery from WAL archive. I expected that once the slot is
> lost, WALs will be removed up to max_wal_size. This did not happen though.
> It seems that Postgres tried to maintain something close to
> max_slot_wal_keep_size (600GB) available, in case replica starts catching up
> again. Over the time, there was no single transaction which would require
> this much WAL to be kept. archiving wasn't behind either.

Useless WAL files will be removd after a checkpoint runs.

They should be, but they are not. That is the bug. They just hang around, checkpoint after checkpoint. Some of them do get cleaned up, to make up for new ones created during that cycle. It treats max_slot_wal_keep the same way it treats wal_keep_size (but only if a "lost" slot is hanging around). If you drop the lost slot, only then does it remove all the accumulated WAL at the next checkpoint.

Cheers,

Jeff

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Kyotaro Horiguchi

Date:

15 July 2021, 05:22:35

At Wed, 14 Jul 2021 19:10:26 -0400, Jeff Janes <jeff.janes@gmail.com> wrote in 
> On Tue, Jul 13, 2021 at 10:12 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> wrote:
> > Useless WAL files will be removd after a checkpoint runs.
> >
> 
> They should be, but they are not.  That is the bug.   They just hang
> around, checkpoint after checkpoint.  Some of them do get cleaned up, to
> make up for new ones created during that cycle.  It treats
> max_slot_wal_keep the same way it treats wal_keep_size (but only if a
> "lost" slot is hanging around).  If you drop the lost slot, only then does
> it remove all the accumulated WAL at the next checkpoint.

Thanks! I saw the issue here.  Some investigation showd me a doubious
motion of XLogCtl->repliationSlotMinLSN.  Slot invalidation is
forgetting to recalculate it and that misbehavior retreats the segment
horizon.

So the attached worked for me.  I'll repost the polished version
including test.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c7c928f50b..0fc0feb88e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9301,6 +9301,15 @@ CreateCheckPoint(int flags)
     XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
     KeepLogSeg(recptr, &_logSegNo);
     InvalidateObsoleteReplicationSlots(_logSegNo);
+
+    /*
+     * Some slots may have been gone, recalculate the segments to keep based on
+     * the remaining slots.
+     */
+    ReplicationSlotsComputeRequiredLSN();
+    XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
+    KeepLogSeg(recptr, &_logSegNo);
+    
     _logSegNo--;
     RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);
 
@@ -9641,6 +9650,15 @@ CreateRestartPoint(int flags)
     endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
     KeepLogSeg(endptr, &_logSegNo);
     InvalidateObsoleteReplicationSlots(_logSegNo);
+
+    /*
+     * Some slots may have been gone, recalculate the segments to keep based on
+     * the remaining slots.
+     */
+    ReplicationSlotsComputeRequiredLSN();
+    XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
+    KeepLogSeg(endptr, &_logSegNo);
+
     _logSegNo--;
 
     /*

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Kyotaro Horiguchi

Date:

15 July 2021, 07:33:48

At Thu, 15 Jul 2021 14:22:35 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Wed, 14 Jul 2021 19:10:26 -0400, Jeff Janes <jeff.janes@gmail.com> wrote in 
> > They should be, but they are not.  That is the bug.   They just hang
> > around, checkpoint after checkpoint.  Some of them do get cleaned up, to
> > make up for new ones created during that cycle.  It treats
> > max_slot_wal_keep the same way it treats wal_keep_size (but only if a
> > "lost" slot is hanging around).  If you drop the lost slot, only then does
> > it remove all the accumulated WAL at the next checkpoint.
> 
> Thanks! I saw the issue here.  Some investigation showd me a doubious
> motion of XLogCtl->repliationSlotMinLSN.  Slot invalidation is
> forgetting to recalculate it and that misbehavior retreats the segment
> horizon.
> 
> So the attached worked for me.  I'll repost the polished version
> including test.

This is it. It is for the master branch but also applicable to 14 as
is. Not needed for earlier version.
I believe the test works for Windows but haven't checked.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 91705833be615690388e41d176c16b2c294bb0cf Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 15 Jul 2021 15:52:48 +0900
Subject: [PATCH v1] Advance old-segment horizon properly after slot
 invalidation

When some slots are invalidated due to the limit of
max_slot_wal_keep_size, the old segment horizon should advance to
within the limit.

Previously that advancement didn't happen.  Hence especially if no
valid slots remain after a slot-invalidation, the horizon no longer
advances and the live WAL files no longer be less than
max_slot_wal_keep_size thereafter.

Backpatch to 14 where the feature was introduced.
---
 src/backend/access/transam/xlog.c         | 24 +++++++++++++++++++++--
 src/backend/replication/slot.c            | 20 ++++++++++++++++---
 src/include/replication/slot.h            |  2 +-
 src/test/recovery/t/019_replslot_limit.pl | 11 ++++++++++-
 4 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c7c928f50b..6500ee5b11 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9300,7 +9300,17 @@ CreateCheckPoint(int flags)
      */
     XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
     KeepLogSeg(recptr, &_logSegNo);
-    InvalidateObsoleteReplicationSlots(_logSegNo);
+    if (InvalidateObsoleteReplicationSlots(_logSegNo))
+    {
+        /*
+         * Some slots have been gone, recalculate the old-segment horizon
+         * excluding the invalidated slots.
+         */
+        ReplicationSlotsComputeRequiredLSN();
+        XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
+        KeepLogSeg(recptr, &_logSegNo);
+    }
+
     _logSegNo--;
     RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);
 
@@ -9640,7 +9650,17 @@ CreateRestartPoint(int flags)
     replayPtr = GetXLogReplayRecPtr(&replayTLI);
     endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
     KeepLogSeg(endptr, &_logSegNo);
-    InvalidateObsoleteReplicationSlots(_logSegNo);
+    if (InvalidateObsoleteReplicationSlots(_logSegNo))
+    {
+        /*
+         * Some slots have been gone, recalculate the old-segment horizon
+         * excluding the invalidated slots.
+         */
+        ReplicationSlotsComputeRequiredLSN();
+        XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
+        KeepLogSeg(endptr, &_logSegNo);
+    }
+
     _logSegNo--;
 
     /*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 33b85d86cc..b8eddcab53 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1143,11 +1143,14 @@ ReplicationSlotReserveWal(void)
  * Returns whether ReplicationSlotControlLock was released in the interim (and
  * in that case we're not holding the lock at return, otherwise we are).
  *
+ * Sets *invalidated if the slot was invalidated but never unsets otherwise.
+ *
  * This is inherently racy, because we release the LWLock
  * for syscalls, so caller must restart if we return true.
  */
 static bool
-InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN)
+InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
+                               bool *invalidated)
 {
     int            last_signaled_pid = 0;
     bool        released_lock = false;
@@ -1193,6 +1196,12 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN)
         slotname = s->data.name;
         active_pid = s->active_pid;
 
+        /*
+         * Inform the caller that this slot have got invalidated. We could just
+         * assign true here but make it clear what we are intending here.
+         */
+        *invalidated |= true;
+
         /*
          * If the slot can be acquired, do so and mark it invalidated
          * immediately.  Otherwise we'll signal the owning process, below, and
@@ -1291,12 +1300,15 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN)
  * Mark any slot that points to an LSN older than the given segment
  * as invalid; it requires WAL that's about to be removed.
  *
+ * Returns true when any slot have got invalidated.
+ *
  * NB - this runs as part of checkpoint, so avoid raising errors if possible.
  */
-void
+bool
 InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno)
 {
     XLogRecPtr    oldestLSN;
+    bool        invalidated = false;
 
     XLogSegNoOffsetToRecPtr(oldestSegno, 0, wal_segment_size, oldestLSN);
 
@@ -1309,13 +1321,15 @@ restart:
         if (!s->in_use)
             continue;
 
-        if (InvalidatePossiblyObsoleteSlot(s, oldestLSN))
+        if (InvalidatePossiblyObsoleteSlot(s, oldestLSN, &invalidated))
         {
             /* if the lock was released, start from scratch */
             goto restart;
         }
     }
     LWLockRelease(ReplicationSlotControlLock);
+
+    return invalidated;
 }
 
 /*
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 34d95eac8e..e32fb85db8 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -213,7 +213,7 @@ extern void ReplicationSlotsComputeRequiredLSN(void);
 extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
-extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
+extern bool InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
 extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot);
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index d4b9ff705f..b76ed0fdac 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -11,7 +11,7 @@ use TestLib;
 use PostgresNode;
 
 use File::Path qw(rmtree);
-use Test::More tests => $TestLib::windows_os ? 14 : 18;
+use Test::More tests => $TestLib::windows_os ? 15 : 19;
 use Time::HiRes qw(usleep);
 
 $ENV{PGDATABASE} = 'postgres';
@@ -192,6 +192,15 @@ $result = $node_primary->safe_psql('postgres',
 is($result, "rep1|f|t|lost|",
     'check that the slot became inactive and the state "lost" persists');
 
+# The invalidated slot shouldn't retreat the old-segment horizon
+my $redoseg = $node_primary->safe_psql('postgres',
+    "select pg_walfile_name(lsn) from pg_create_physical_replication_slot('xx', true)");
+my $oldestseg = $node_primary->safe_psql('postgres',
+    "select pg_ls_dir from pg_ls_dir('pg_wal') where pg_ls_dir like '0%' order by pg_ls_dir limit 1");
+
+is($oldestseg, $redoseg,
+   "check if slot-invalidation advances the old segment horizon");
+
 # The standby no longer can connect to the primary
 $logstart = get_log_size($node_standby);
 $node_standby->start;
-- 
2.27.0

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Marcin Krupowicz

Date:

15 July 2021, 08:59:38

On Thu, 15 Jul 2021 at 08:33, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> This is it. It is for the master branch but also applicable to 14 as
> is. Not needed for earlier version.
> I believe the test works for Windows but haven't checked.

Thanks for that. To clarify - is it not going to be fixed in 13.x?

-- Marcin

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Alvaro Herrera

Date:

15 July 2021, 16:09:45

On 2021-Jul-15, Marcin Krupowicz wrote:

> On Thu, 15 Jul 2021 at 08:33, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> > This is it. It is for the master branch but also applicable to 14 as
> > is. Not needed for earlier version.
> > I believe the test works for Windows but haven't checked.

Oh my, what an oversight.  Thanks Kyotaro for the investigation and fix;
I'll get it pushed today.

> Thanks for that. To clarify - is it not going to be fixed in 13.x?

It definitely is, since the feature is there.  I'll fix the apply
conflicts, since they're pretty trivial.  (No need to send a new patch.)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Ed is the standard text editor."
      http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Jeff Janes

Date:

15 July 2021, 16:11:07

On Thu, Jul 15, 2021 at 1:22 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Wed, 14 Jul 2021 19:10:26 -0400, Jeff Janes <jeff.janes@gmail.com> wrote in
> On Tue, Jul 13, 2021 at 10:12 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> wrote:
> > Useless WAL files will be removd after a checkpoint runs.
> >
>
> They should be, but they are not. That is the bug. They just hang
> around, checkpoint after checkpoint. Some of them do get cleaned up, to
> make up for new ones created during that cycle. It treats
> max_slot_wal_keep the same way it treats wal_keep_size (but only if a
> "lost" slot is hanging around). If you drop the lost slot, only then does
> it remove all the accumulated WAL at the next checkpoint.

Thanks! I saw the issue here. Some investigation showd me a doubious
motion of XLogCtl->repliationSlotMinLSN. Slot invalidation is
forgetting to recalculate it and that misbehavior retreats the segment
horizon.

So the attached worked for me. I'll repost the polished version
including test.

Thank you. That works for me. But I did not test on Windows.

* Some slots may have been gone,

"been invalidated" reads better than "been gone", and matches the wording used elsewhere.

Cheers,

Jeff

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Alvaro Herrera

Date:

15 July 2021, 19:09:55

On 2021-Jul-15, Kyotaro Horiguchi wrote:

> This is it. It is for the master branch but also applicable to 14 as
> is. Not needed for earlier version.
> I believe the test works for Windows but haven't checked.

I looked at it.  I think it is better to make the calls to 
ReplicationSlotsComputeRequiredLSN() in slot.c (which is where most other
calls to that function are).  Also we should recompute the minimum
required Xmin at that point.

Another change I did was move the "*invalidated=true" assignment to the
block where we actually invalidate the slot; in your patch you were
doing it possibly too early if the slot was in use by some other
process.  (For end effect it probably doesn't matter much, but it's
better to have it right.)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Attachment

v2-0001-Advance-old-segment-horizon-properly-after-slot-i.patch

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Alvaro Herrera

Date:

15 July 2021, 20:49:10

On 2021-Jul-15, Kyotaro Horiguchi wrote:

> > Thanks! I saw the issue here.  Some investigation showd me a doubious
> > motion of XLogCtl->repliationSlotMinLSN.  Slot invalidation is
> > forgetting to recalculate it and that misbehavior retreats the segment
> > horizon.

Actually, looking again, isn't this supposed to happen in KeepLogSeg()?
We have a block that caps to max_slot_wal_keep_size_mb there ... why did
that not work?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Sallah, I said NO camels! That's FIVE camels; can't you count?"
(Indiana Jones)

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Alvaro Herrera

Date:

15 July 2021, 21:33:20

On 2021-Jul-15, Alvaro Herrera wrote:

> Actually, looking again, isn't this supposed to happen in KeepLogSeg()?
> We have a block that caps to max_slot_wal_keep_size_mb there ... why did
> that not work?

I find that this smaller patch is sufficient to make the added test case
work.  However, I'm not sure I understand *why* ...

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/

Attachment

v5-0001-Advance-old-segment-horizon-properly-after-slot-i.patch

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Kyotaro Horiguchi

Date:

16 July 2021, 02:05:00

At Thu, 15 Jul 2021 17:33:20 -0400, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in 
> On 2021-Jul-15, Alvaro Herrera wrote:
> 
> > Actually, looking again, isn't this supposed to happen in KeepLogSeg()?
> > We have a block that caps to max_slot_wal_keep_size_mb there ... why did
> > that not work?
> 
> I find that this smaller patch is sufficient to make the added test case
> work.  However, I'm not sure I understand *why* ...

It's because another checkpoint is running at the time the manual
checkpoint just before is invoked.  Two successive checkpoint hides
the defect.  The checkpoint works as the rendezvous point for the
succeeding tests so I added another sync point instead of the manual
checkpoint.  The test in the attached correctly fails if _logSegNo
were not updated by slot invalidation.

By the way, about the previous version, XLByteToSeg is needed since
KeepLogSeg doesn't advance _logSegNo, which is the wanted action
here. The test in the attached fails if only KeepLogSeg is called
again.

I confirmed that *the previous* version of the test works for Windows.
(So there's no reason that the current version doesn't work.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From cd9ecce8655359a2bbb147f4d017de3e66a2bb00 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 15 Jul 2021 15:52:48 +0900
Subject: [PATCH v6] Advance old-segment horizon properly after slot
 invalidation

When some slots are invalidated due to the max_slot_wal_keep_size limit,
the old segment horizon should move forward to stay within the limit.
However, in commit c6550776394e we forgot to call KeepLogSeg again to
recompute the horizon after invalidating replication slots.  In cases
where other slots remained, the limits would be recomputed eventually
for other reasons, but if all slots were invalidated, the limits would
not move at all afterwards.  Repair.

Backpatch to 13 where the feature was introduced.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reported-by: Marcin Krupowicz <mk@071.ovh>
Discussion: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org
---
 src/backend/access/transam/xlog.c         | 24 +++++++++++++++--
 src/backend/replication/slot.c            | 26 +++++++++++++++---
 src/include/replication/slot.h            |  2 +-
 src/test/recovery/t/019_replslot_limit.pl | 33 ++++++++++++++++++-----
 4 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index edb15fe58d..752b1099ee 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9300,7 +9300,17 @@ CreateCheckPoint(int flags)
      */
     XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
     KeepLogSeg(recptr, &_logSegNo);
-    InvalidateObsoleteReplicationSlots(_logSegNo);
+    if (InvalidateObsoleteReplicationSlots(_logSegNo))
+    {
+        /*
+         * Some slots have been gone, recalculate the old-segment horizon
+         * excluding the invalidated slots. XLByteToSeg is needed here to
+         * advance _logSegNo.
+         */
+        XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
+        KeepLogSeg(recptr, &_logSegNo);
+    }
+
     _logSegNo--;
     RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);
 
@@ -9640,7 +9650,17 @@ CreateRestartPoint(int flags)
     replayPtr = GetXLogReplayRecPtr(&replayTLI);
     endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
     KeepLogSeg(endptr, &_logSegNo);
-    InvalidateObsoleteReplicationSlots(_logSegNo);
+    if (InvalidateObsoleteReplicationSlots(_logSegNo))
+    {
+        /*
+         * Some slots have been gone, recalculate the old-segment horizon
+         * excluding the invalidated slots. XLByteToSeg is needed here to
+         * advance _logSegNo.
+         */
+        XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
+        KeepLogSeg(endptr, &_logSegNo);
+    }
+
     _logSegNo--;
 
     /*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 33b85d86cc..33e9acab4a 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1143,11 +1143,14 @@ ReplicationSlotReserveWal(void)
  * Returns whether ReplicationSlotControlLock was released in the interim (and
  * in that case we're not holding the lock at return, otherwise we are).
  *
+ * Sets *invalidated true if the slot was invalidated. (Untouched otherwise.)
+ *
  * This is inherently racy, because we release the LWLock
  * for syscalls, so caller must restart if we return true.
  */
 static bool
-InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN)
+InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
+                               bool *invalidated)
 {
     int            last_signaled_pid = 0;
     bool        released_lock = false;
@@ -1204,6 +1207,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN)
             s->active_pid = MyProcPid;
             s->data.invalidated_at = restart_lsn;
             s->data.restart_lsn = InvalidXLogRecPtr;
+
+            /* Let caller know */
+            *invalidated = true;
         }
 
         SpinLockRelease(&s->mutex);
@@ -1291,12 +1297,15 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN)
  * Mark any slot that points to an LSN older than the given segment
  * as invalid; it requires WAL that's about to be removed.
  *
+ * Returns true when any slot have got invalidated.
+ *
  * NB - this runs as part of checkpoint, so avoid raising errors if possible.
  */
-void
+bool
 InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno)
 {
     XLogRecPtr    oldestLSN;
+    bool        invalidated = false;
 
     XLogSegNoOffsetToRecPtr(oldestSegno, 0, wal_segment_size, oldestLSN);
 
@@ -1309,13 +1318,24 @@ restart:
         if (!s->in_use)
             continue;
 
-        if (InvalidatePossiblyObsoleteSlot(s, oldestLSN))
+        if (InvalidatePossiblyObsoleteSlot(s, oldestLSN, &invalidated))
         {
             /* if the lock was released, start from scratch */
             goto restart;
         }
     }
     LWLockRelease(ReplicationSlotControlLock);
+
+    /*
+     * If any slots have been invalidated, recalculate the resource limits.
+     */
+    if (invalidated)
+    {
+        ReplicationSlotsComputeRequiredXmin(false);
+        ReplicationSlotsComputeRequiredLSN();
+    }
+
+    return invalidated;
 }
 
 /*
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 34d95eac8e..e32fb85db8 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -213,7 +213,7 @@ extern void ReplicationSlotsComputeRequiredLSN(void);
 extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
-extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
+extern bool InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
 extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot);
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index d4b9ff705f..bbdaaca14c 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -11,7 +11,7 @@ use TestLib;
 use PostgresNode;
 
 use File::Path qw(rmtree);
-use Test::More tests => $TestLib::windows_os ? 14 : 18;
+use Test::More tests => $TestLib::windows_os ? 15 : 19;
 use Time::HiRes qw(usleep);
 
 $ENV{PGDATABASE} = 'postgres';
@@ -176,7 +176,13 @@ ok( !find_in_log(
 # Advance WAL again, the slot loses the oldest segment.
 my $logstart = get_log_size($node_primary);
 advance_wal($node_primary, 7);
-$node_primary->safe_psql('postgres', "CHECKPOINT;");
+
+# This slot should be broken, wait for that to happen
+$node_primary->poll_query_until(
+    'postgres',
+    qq[
+    SELECT wal_status = 'lost' FROM pg_replication_slots
+    WHERE slot_name = 'rep1']);
 
 # WARNING should be issued
 ok( find_in_log(
@@ -185,13 +191,28 @@ ok( find_in_log(
         $logstart),
     'check that the warning is logged');
 
-# This slot should be broken
-$result = $node_primary->safe_psql('postgres',
-    "SELECT slot_name, active, restart_lsn IS NULL, wal_status, safe_wal_size FROM pg_replication_slots WHERE
slot_name= 'rep1'"
 
-);
+$result = $node_primary->safe_psql(
+    'postgres',
+    qq[
+    SELECT slot_name, active, restart_lsn IS NULL, wal_status, safe_wal_size
+    FROM pg_replication_slots WHERE slot_name = 'rep1']);
 is($result, "rep1|f|t|lost|",
     'check that the slot became inactive and the state "lost" persists');
 
+# The invalidated slot shouldn't keep the old-segment horizon back;
+# see bug #17103: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org
+# Test for this by creating a new slot and comparing its restart LSN
+# to the oldest existing file.
+my $redoseg = $node_primary->safe_psql('postgres',
+    "SELECT pg_walfile_name(lsn) FROM pg_create_physical_replication_slot('s2', true)"
+);
+my $oldestseg = $node_primary->safe_psql('postgres',
+                                         "SELECT pg_ls_dir AS f FROM pg_ls_dir('pg_wal') WHERE pg_ls_dir ~
'^[0-9A-F]{24}\$'ORDER BY 1 LIMIT 1"
 
+  );
+$node_primary->safe_psql('postgres',
+                         qq[SELECT pg_drop_replication_slot('s2')]);
+is($oldestseg, $redoseg, "check that segments have been removed");
+
 # The standby no longer can connect to the primary
 $logstart = get_log_size($node_standby);
 $node_standby->start;
-- 
2.27.0

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Alvaro Herrera

Date:

16 July 2021, 02:49:11

On 2021-Jul-16, Kyotaro Horiguchi wrote:

> It's because another checkpoint is running at the time the manual
> checkpoint just before is invoked.  Two successive checkpoint hides
> the defect.  The checkpoint works as the rendezvous point for the
> succeeding tests so I added another sync point instead of the manual
> checkpoint.  The test in the attached correctly fails if _logSegNo
> were not updated by slot invalidation.

OK, I thought as much and tried to catch it in the act but couldn't.
Your explanation makes sense.

> By the way, about the previous version, XLByteToSeg is needed since
> KeepLogSeg doesn't advance _logSegNo, which is the wanted action
> here. The test in the attached fails if only KeepLogSeg is called
> again.

Oh, true.

> I confirmed that *the previous* version of the test works for Windows.
> (So there's no reason that the current version doesn't work.)

Great, thanks.

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Siempre hay que alimentar a los dioses, aunque la tierra esté seca" (Orual)

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Alvaro Herrera

Date:

16 July 2021, 16:13:40

Okay, I've now pushed it.  Thanks!

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Alvaro Herrera

Date:

16 July 2021, 18:09:33

I was running tests for another patch, and this one failed once:

Testing REL_14_BETA2-168-g1cb94c3c4c started: vie jul 16 14:02:11 -04 2021

#   Failed test 'check that the warning is logged'
#   at t/019_replslot_limit.pl line 187.

#   Failed test 'check that the slot became inactive and the state "lost" persists'
#   at t/019_replslot_limit.pl line 198.
#          got: 'rep1|t|t|lost|'
#     expected: 'rep1|f|t|lost|'

#   Failed test 'check that segments have been removed'
#   at t/019_replslot_limit.pl line 213.
#          got: '000000010000000000000014'
#     expected: '00000001000000000000001C'
# Looks like you failed 3 tests of 19.
make[2]: *** [Makefile:23: check] Error 1

The buildfarm has remained green so far, but clearly this is something
we need to fix.  Maybe it's as simple as adding the loop we use below,
starting at line 219.

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Alvaro Herrera

Date:

17 July 2021, 14:28:09

On 2021-Jul-16, Alvaro Herrera wrote:

> The buildfarm has remained green so far, but clearly this is something
> we need to fix.  Maybe it's as simple as adding the loop we use below,
> starting at line 219.

There are a few failures of this on buildfarm now,

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2021-07-16%2022%3A30%3A35
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2021-07-16%2021%3A52%3A04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2021-07-16%2021%3A52%3A05

I am running the test in a loop with the attached; if it doesn't fail in
a few more rounds I'll push it.

There are two instances of a different failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-17%2013%3A39%3A43
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2021-07-16%2021%3A14%3A14

#   Failed test 'check that segments have been removed'
#   at t/019_replslot_limit.pl line 213.
#          got: '000000010000000000000021'
#     expected: '000000010000000000000022'
# Looks like you failed 1 test of 19.
[23:55:14] t/019_replslot_limit.pl .............. 
Dubious, test returned 1 (wstat 256, 0x100)

I'm afraid about this not being something we can fix with some
additional wait points ...

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/
"La grandeza es una experiencia transitoria.  Nunca es consistente.
Depende en gran parte de la imaginación humana creadora de mitos"
(Irulan)

Attachment

loop.patch

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Kyotaro Horiguchi

Date:

19 July 2021, 02:13:18

At Sat, 17 Jul 2021 10:28:09 -0400, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in 
> On 2021-Jul-16, Alvaro Herrera wrote:
> 
> > The buildfarm has remained green so far, but clearly this is something
> > we need to fix.  Maybe it's as simple as adding the loop we use below,
> > starting at line 219.
> 
> There are a few failures of this on buildfarm now,
..
> I am running the test in a loop with the attached; if it doesn't fail in
> a few more rounds I'll push it.
> 
> There are two instances of a different failure:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-17%2013%3A39%3A43
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2021-07-16%2021%3A14%3A14
> 
> #   Failed test 'check that segments have been removed'
> #   at t/019_replslot_limit.pl line 213.
> #          got: '000000010000000000000021'
> #     expected: '000000010000000000000022'
> # Looks like you failed 1 test of 19.
> [23:55:14] t/019_replslot_limit.pl .............. 
> Dubious, test returned 1 (wstat 256, 0x100)
> 
> I'm afraid about this not being something we can fix with some
> additional wait points ...

Sorry for the mistake.  It seems to me the cause the above is that
segment removal happens *after* invalidation. Since (at least
currently) the "slot is invalidated" warning issued only at the time
just before WAL-removal, we should expect that old segments are gone
after the checkpoint-ending log, which should be seen after
WAL-removal.  If not, that shows that there's a bug.

What do you think about the attached?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From c52d7931e95cc24804f9aac4c9bf3a388c7e461f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Mon, 19 Jul 2021 10:58:01 +0900
Subject: [PATCH v1] Remove possible instability of new replication slot test
 code

The last fix for the same left another possible timing unstability
between actual segment removal and the invalidation log. Make it
steady by waiting for checkpoint-ending log, which is issued after the
segment removal.
---
 src/test/recovery/t/019_replslot_limit.pl | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 026da02ff1..a5d8140807 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -11,7 +11,7 @@ use TestLib;
 use PostgresNode;
 
 use File::Path qw(rmtree);
-use Test::More tests => $TestLib::windows_os ? 15 : 19;
+use Test::More tests => $TestLib::windows_os ? 16 : 20;
 use Time::HiRes qw(usleep);
 
 $ENV{PGDATABASE} = 'postgres';
@@ -201,6 +201,19 @@ $result = $node_primary->safe_psql(
 is($result, "rep1|f|t|lost|",
     'check that the slot became inactive and the state "lost" persists');
 
+# Make sure the current checkpoint ended
+my $checkpoint_ended = 0;
+for (my $i = 0; $i < 10000; $i++)
+{
+    if (find_in_log($node_primary, "checkpoint complete: ", $logstart))
+    {
+        $checkpoint_ended = 1;
+        last;
+    }
+    usleep(100_000);
+}
+ok($checkpoint_ended, 'make sure checkpoint ended');
+
 # The invalidated slot shouldn't keep the old-segment horizon back;
 # see bug #17103: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org
 # Test for this by creating a new slot and comparing its restart LSN
-- 
2.27.0

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Alvaro Herrera

Date:

19 July 2021, 21:24:48

On 2021-Jul-19, Kyotaro Horiguchi wrote:

> Sorry for the mistake.  It seems to me the cause the above is that
> segment removal happens *after* invalidation. Since (at least
> currently) the "slot is invalidated" warning issued only at the time
> just before WAL-removal, we should expect that old segments are gone
> after the checkpoint-ending log, which should be seen after
> WAL-removal.  If not, that shows that there's a bug.
> 
> What do you think about the attached?

Sounds sensible -- I verified the logs for one of the cases that failed
in the buildfarm, and indeed the "checkpoint ended" message appears
after the s2 slot is created, so it should fix the problem.  (I didn't
actually try to reproduce the problem locally, so I didn't verify the
fix any further than ensuring the test still passes.)

Pushed, thanks!

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Kyotaro Horiguchi

Date:

20 July 2021, 04:18:20

At Mon, 19 Jul 2021 17:24:48 -0400, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in 
> On 2021-Jul-19, Kyotaro Horiguchi wrote:
> 
> > Sorry for the mistake.  It seems to me the cause the above is that
> > segment removal happens *after* invalidation. Since (at least
> > currently) the "slot is invalidated" warning issued only at the time
> > just before WAL-removal, we should expect that old segments are gone
> > after the checkpoint-ending log, which should be seen after
> > WAL-removal.  If not, that shows that there's a bug.
> > 
> > What do you think about the attached?
> 
> Sounds sensible -- I verified the logs for one of the cases that failed
> in the buildfarm, and indeed the "checkpoint ended" message appears
> after the s2 slot is created, so it should fix the problem.  (I didn't
> actually try to reproduce the problem locally, so I didn't verify the
> fix any further than ensuring the test still passes.)
> 
> Pushed, thanks!

Thaks, and sorry for the series of bugs.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Tom Lane

Date:

28 July 2021, 15:38:28

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2021-Jul-19, Kyotaro Horiguchi wrote:
>> What do you think about the attached?

> Sounds sensible -- I verified the logs for one of the cases that failed
> in the buildfarm, and indeed the "checkpoint ended" message appears
> after the s2 slot is created, so it should fix the problem.  (I didn't
> actually try to reproduce the problem locally, so I didn't verify the
> fix any further than ensuring the test still passes.)

This test is still unstable :-(

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-07-20%2012%3A46%3A11
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-20%2015%3A05%3A39
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01

These all look like

#   Failed test 'check that segments have been removed'
#   at t/019_replslot_limit.pl line 226.
#          got: '000000010000000000000020'
#     expected: '000000010000000000000024'
# Looks like you failed 1 test of 16.

with varying values mentioned.  It looks to me like WAL file cleanup
is not as synchronous with slot creation as the test thinks.
Maybe it needs to loop until the oldest WAL file matches what it
expects?

            regards, tom lane

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Kyotaro Horiguchi

Date:

29 July 2021, 07:20:38

At Wed, 28 Jul 2021 11:38:28 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> This test is still unstable :-(
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-07-20%2012%3A46%3A11
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-20%2015%3A05%3A39
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01
> 
> These all look like
> 
> #   Failed test 'check that segments have been removed'
> #   at t/019_replslot_limit.pl line 226.
> #          got: '000000010000000000000020'
> #     expected: '000000010000000000000024'
> # Looks like you failed 1 test of 16.
> 
> with varying values mentioned.  It looks to me like WAL file cleanup
> is not as synchronous with slot creation as the test thinks.
> Maybe it needs to loop until the oldest WAL file matches what it
> expects?

Sorry for the kludge.

Mmm. In the failure cases, directory scan(@16:52:22.036) runs before
the targetted checkpoint completes(@16:52:22.144).

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01
16:52:17.328 LOG:  checkpoint starting: wal
16:52:19.140 LOG:  invalidating slot "rep1" because its restart_lsn 0/1D00000 exceeds max_slot_wal_keep_size
16:52:19.316 019_replslot_limit.pl LOG:  statement: SELECT pg_walfile_name(lsn) FROM
pg_create_physical_replication_slot('s2',true)
 
16:52:22.036 019_replslot_limit.pl LOG:  statement: SELECT pg_ls_dir AS f FROM pg_ls_dir('pg_wal') WHERE pg_ls_dir ~
'^[0-9A-F]{24}$'ORDER BY 1 LIMIT
 
16:52:22.077 019_replslot_limit.pl LOG:  statement: SELECT pg_drop_replication_slot('s2')
16:52:22.144 LOG:  checkpoint complete: wrote 18 buffers (14.1%); 0 WAL file(s) added, 4 removed, 3 recycled;
write=1.806s, sync=0.001 s, total=4.817 s; sync files=0, longest=0.000 s, average=0.000 s; distance=3072 kB,
estimate=3072kB
 

The reason is the previous checkpoint completed after starting to
advance segments

> my $logstart = get_log_size($node_primary);
> advance_wal($node_primary, 7);   !!!! another checkpoint runs/ends
...
16:52:19.140  # check for "invalidate slots" in log.
              # check for the "lost" state in pg_replication_slots.
             !! checkfor "checkpint complete" in log.
16:52:22.077  # read redo segment and oldest wal.
16:52:22.144 !! The target checkpoint ends.

So it seems to me we need to explicitly prevent unexpected checkpoints
from happening maybe by enlarging max_wal_size temporily.

I'll going that way.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Kyotaro Horiguchi

Date:

29 July 2021, 07:45:40

At Thu, 29 Jul 2021 16:20:38 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> So it seems to me we need to explicitly prevent unexpected checkpoints
> from happening maybe by enlarging max_wal_size temporily.
> 
> I'll going that way.

I ended up with the attached. It causes a checkpoint reliably exactly
at the aimed timing without.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From faca25782953d208bcf7b1e4ea0d62a40be3c7f1 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 29 Jul 2021 16:36:32 +0900
Subject: [PATCH] Make slot test more stable

The slot test still instable in the case where test script runs
relatively slower than checkpointing. Remove the instability by
causing a checkpoint exactly at the aimed timing.
---
 src/test/recovery/t/019_replslot_limit.pl | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 54ddcef2bb..ee4801b49c 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -173,10 +173,20 @@ ok( !find_in_log(
         "requested WAL segment [0-9A-F]+ has already been removed"),
     'check that required WAL segments are still available');
 
-# Advance WAL again, the slot loses the oldest segment.
+# prevent checkpoints from occurring while advancing WAL segments
+$node_primary->safe_psql('postgres', "CHECKPOINT;");
+$node_primary->safe_psql('postgres',
+     "ALTER SYSTEM SET max_wal_size='40MB'; SELECT pg_reload_conf()");
+
+# Advance WAL again, the slot loses the oldest segment by the coming checkpoint.
 my $logstart = get_log_size($node_primary);
 advance_wal($node_primary, 7);
 
+# Cause a checkpoint
+$node_primary->safe_psql('postgres',
+     'ALTER SYSTEM RESET max_wal_size; SELECT pg_reload_conf()');
+$node_primary->safe_psql('postgres', "CHECKPOINT;");
+
 # wait until the WARNING is issued
 my $invalidated = 0;
 for (my $i = 0; $i < 10000; $i++)
-- 
2.27.0

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Kyotaro Horiguchi

Date:

29 July 2021, 07:50:07

(Somehow the previous mail was broken in many ways. I re-send it.)

At Thu, 29 Jul 2021 16:20:38 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> So it seems to me we need to explicitly prevent unexpected checkpoints
> from happening maybe by enlarging max_wal_size temporily.
> 
> I'll going that way.

I ended up with the attached. It causes a checkpoint reliably exactly
at the aimed timing without.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 5c6fa9cd8a6449931de339c628a3a81367f7b22a Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 29 Jul 2021 16:36:32 +0900
Subject: [PATCH] Make slot test more stable

The slot test is still instable in the case where the test script runs
at relatively slower speed than checkpointing.  Remove the instability
by causing a checkpoint exactly at the aimed timing.
---
 src/test/recovery/t/019_replslot_limit.pl | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 54ddcef2bb..ee4801b49c 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -173,10 +173,20 @@ ok( !find_in_log(
         "requested WAL segment [0-9A-F]+ has already been removed"),
     'check that required WAL segments are still available');
 
-# Advance WAL again, the slot loses the oldest segment.
+# prevent checkpoints from occurring while advancing WAL segments
+$node_primary->safe_psql('postgres', "CHECKPOINT;");
+$node_primary->safe_psql('postgres',
+     "ALTER SYSTEM SET max_wal_size='40MB'; SELECT pg_reload_conf()");
+
+# Advance WAL again, the slot loses the oldest segment by the coming checkpoint.
 my $logstart = get_log_size($node_primary);
 advance_wal($node_primary, 7);
 
+# Cause a checkpoint
+$node_primary->safe_psql('postgres',
+     'ALTER SYSTEM RESET max_wal_size; SELECT pg_reload_conf()');
+$node_primary->safe_psql('postgres', "CHECKPOINT;");
+
 # wait until the WARNING is issued
 my $invalidated = 0;
 for (my $i = 0; $i < 10000; $i++)
-- 
2.27.0

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From

Kyotaro Horiguchi

Date:

02 August 2021, 06:00:39

At Thu, 29 Jul 2021 17:19:24 -0400, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in 
> On 2021-Jul-29, Kyotaro Horiguchi wrote:
> 
> > At Thu, 29 Jul 2021 16:20:38 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > > So it seems to me we need to explicitly prevent unexpected checkpoints
> > > from happening maybe by enlarging max_wal_size temporily.
> > 
> > I ended up with the attached. It causes a checkpoint reliably exactly
> > at the aimed timing without.
> 
> Thanks for looking into it.  The explanation makes sense, so I pushed
> your patch and also fixed one outdated comment I noticed while reading
> the related code.  I hope this commit will solve the problem ... it is
> quite low probability, so we'll have to wait at least two weeks to make
> any conclusion.

Agreed. I hope this is the last failure..

Thanks.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center