Thread: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17103 Logged by: Marcin Krupowicz Email address: mk@071.ovh PostgreSQL version: 13.3 Operating system: CentOS 7.6 Description: Hi, After the replication fell behind and the lag exceeded max_slot_wal_keep_size, WAL on master were not removed. It seems that Postgres tried to maintain max_slot_wal_keep_size worth of segments. Please find the details below (sligthly redacted version of what I wrote here: https://stackoverflow.com/questions/68314222/replication-lag-exceeding-max-slot-wal-keep-size-wal-segments-not-removed). -- Summary -- We are using max_slot_wal_keep_size from Postgresql 13 to prevent master from being killed by a lagging replication. It seems, that in our case, WAL storage wasn't freed up after exceeding this parameter which resulted in a replication failure. WAL which, as I believe, should have been freed up did not seem to be needed by any other transaction at a time. -- Configuration -- master & one replica - streaming replication using a slot ~700GB available for pg_wal max_slot_wal_keep_size = 600GB min_wal_size = 20GB max_wal_size = 40GB default checkpoint_timeout = 5 minutes (no problem with checkpoints) archiving is on and is catching up well -- What happened -- Under heavy load (large COPY/INSERT transactions, loading hundreds of GB of data), the replication started falling behind. Available space on pg_wal was being reduced in the same rate as safe_slot pg_replication_slot.safe_wal_size - as expected. At some point safe_wal_size went negative and streaming stopped working. It wasn't a problem, because replica started recovery from WAL archive. I expected that once the slot is lost, WALs will be removed up to max_wal_size. This did not happen though. It seems that Postgres tried to maintain something close to max_slot_wal_keep_size (600GB) available, in case replica starts catching up again. Over the time, there was no single transaction which would require this much WAL to be kept. archiving wasn't behind either. Amount of free space on pg_wal was more or less 70GB for most of the time, however at some point, during heavy autovacuuming, it dipped to 0 :( This is when PG crashed and (auto-recovered soon after). After getting back up, there was 11GB left on pg_wal and no transaction running, no loading. This lasted for hours. During this time replica finally caught up from the archive and restored the replication with no delay. None of the WALs were removed. I manually run checkpoint but it did not clear any WALs. I finally restarted Postgresql and during the restarting pg_wal were finally cleared. Again - why PG did not clear WAL? WALs, even more clearly, were not needed by any process. Many thanks, -- Marcin
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Kyotaro Horiguchi
Date:
At Tue, 13 Jul 2021 09:15:17 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in > We are using max_slot_wal_keep_size from Postgresql 13 to prevent master > from being killed by a lagging replication. It seems, that in our case, WAL > storage wasn't freed up after exceeding this parameter which resulted in a > replication failure. WAL which, as I believe, should have been freed up did > not seem to be needed by any other transaction at a time. Yeah, the max_slot_wal_keep_size is the maximum WAL size that replication slots are guaranteed to be able to keep files up to. It is not the size that replication slot are guaranteed not to keep WAL files beyond it. Addition to that, WAL removal happens only at the ending of a checkpoint so WAL files can grow up to max_slot_wal_keep_size plus checkpoint distance assuming an even load. > -- Configuration -- > master & one replica - streaming replication using a slot > ~700GB available for pg_wal > max_slot_wal_keep_size = 600GB > min_wal_size = 20GB > max_wal_size = 40GB > default checkpoint_timeout = 5 minutes (no problem with checkpoints) > archiving is on and is catching up well Assuming an even load (or WAL speed) and 0.5 for checkpoint_completion_target, 40GB of max_wal_size causes checkpoints every 27GB (1706 segments) (*1) at longest (in the case where xlog checkpoint fires before timeout checkpoint). Thus with 600GB of max_slot_wal_keep_size, the maximum size of WAL files can reach 627GB, which size can even be exceeded if a sudden high-load is given. [1] checkpoint distance = max_wal_size / (1.0 + checkpoint_completion_target) > -- What happened -- > Under heavy load (large COPY/INSERT transactions, loading hundreds of GB of > data), the replication started falling behind. Available space on pg_wal was > being reduced in the same rate as safe_slot > pg_replication_slot.safe_wal_size - as expected. At some point safe_wal_size > went negative and streaming stopped working. It wasn't a problem, because > replica started recovery from WAL archive. I expected that once the slot is > lost, WALs will be removed up to max_wal_size. This did not happen though. > It seems that Postgres tried to maintain something close to > max_slot_wal_keep_size (600GB) available, in case replica starts catching up > again. Over the time, there was no single transaction which would require > this much WAL to be kept. archiving wasn't behind either. Useless WAL files will be removd after a checkpoint runs. > Amount of free space on pg_wal was more or less 70GB for most of the time, > however at some point, during heavy autovacuuming, it dipped to 0 :( This is > when PG crashed and (auto-recovered soon after). After getting back up, > there was 11GB left on pg_wal and no transaction running, no loading. This > lasted for hours. During this time replica finally caught up from the > archive and restored the replication with no delay. None of the WALs were > removed. I manually run checkpoint but it did not clear any WALs. I finally > restarted Postgresql and during the restarting pg_wal were finally > cleared. > > Again - why PG did not clear WAL? WALs, even more clearly, were not needed > by any process. Maybe manual CHECKPINT work for you , however, you should reconsider the setting assuming the above behavior to prevent a crash due to WAL storage exhaustion. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Marcin Krupowicz
Date:
Hi, > > We are using max_slot_wal_keep_size from Postgresql 13 to prevent master > > from being killed by a lagging replication. It seems, that in our case, WAL > > storage wasn't freed up after exceeding this parameter which resulted in a > > replication failure. WAL which, as I believe, should have been freed up did > > not seem to be needed by any other transaction at a time. > > Yeah, the max_slot_wal_keep_size is the maximum WAL size that > replication slots are guaranteed to be able to keep files up to. It > is not the size that replication slot are guaranteed not to keep WAL > files beyond it. Addition to that, WAL removal happens only at the > ending of a checkpoint so WAL files can grow up to > max_slot_wal_keep_size plus checkpoint distance assuming an even load. I understand, but the situation lasted for many hours, until my manual reboot. checkpoint timeout is 5 minutes, there were many checkpoints between the time when the slot got lost (exceeding max_slow_wal_keep_size) and my manual reboot. During all that time load was fairly even and the amount of WAL segments stored wasn't changing much. > > -- Configuration -- > > master & one replica - streaming replication using a slot > > ~700GB available for pg_wal > > max_slot_wal_keep_size = 600GB > > min_wal_size = 20GB > > max_wal_size = 40GB > > default checkpoint_timeout = 5 minutes (no problem with checkpoints) > > archiving is on and is catching up well > > Assuming an even load (or WAL speed) and 0.5 for > checkpoint_completion_target, 40GB of max_wal_size causes checkpoints > every 27GB (1706 segments) (*1) at longest (in the case where xlog > checkpoint fires before timeout checkpoint). > > Thus with 600GB of max_slot_wal_keep_size, the maximum size of WAL > files can reach 627GB, which size can even be exceeded if a sudden > high-load is given. > > [1] checkpoint distance = max_wal_size / (1.0 + checkpoint_completion_target) Fair point, I should change my settings slightly - but that's not the issue here. > > > -- What happened -- > > Under heavy load (large COPY/INSERT transactions, loading hundreds of GB of > > data), the replication started falling behind. Available space on pg_wal was > > being reduced in the same rate as safe_slot > > pg_replication_slot.safe_wal_size - as expected. At some point safe_wal_size > > went negative and streaming stopped working. It wasn't a problem, because > > replica started recovery from WAL archive. I expected that once the slot is > > lost, WALs will be removed up to max_wal_size. This did not happen though. > > It seems that Postgres tried to maintain something close to > > max_slot_wal_keep_size (600GB) available, in case replica starts catching up > > again. Over the time, there was no single transaction which would require > > this much WAL to be kept. archiving wasn't behind either. > > Useless WAL files will be removd after a checkpoint runs. That did not happen. > > Amount of free space on pg_wal was more or less 70GB for most of the time, > > however at some point, during heavy autovacuuming, it dipped to 0 :( This is > > when PG crashed and (auto-recovered soon after). After getting back up, > > there was 11GB left on pg_wal and no transaction running, no loading. This > > lasted for hours. During this time replica finally caught up from the > > archive and restored the replication with no delay. None of the WALs were > > removed. I manually run checkpoint but it did not clear any WALs. I finally > > restarted Postgresql and during the restarting pg_wal were finally > > cleared. > > > > Again - why PG did not clear WAL? WALs, even more clearly, were not needed > > by any process. > > Maybe manual CHECKPINT work for you , however, you should reconsider > the setting assuming the above behavior to prevent a crash due to WAL > storage exhaustion. Sorry, I'm confused. I did run manual CHECKPOINT (even though there were many, many non-manual checkpoints run before that) and WAL segments were NOT cleared, until I restarted postgresql. Thanks, -- Marcin
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Jeff Janes
Date:
On Tue, Jul 13, 2021 at 10:12 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Tue, 13 Jul 2021 09:15:17 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in
> We are using max_slot_wal_keep_size from Postgresql 13 to prevent master
> from being killed by a lagging replication. It seems, that in our case, WAL
> storage wasn't freed up after exceeding this parameter which resulted in a
> replication failure. WAL which, as I believe, should have been freed up did
> not seem to be needed by any other transaction at a time.
Yeah, the max_slot_wal_keep_size is the maximum WAL size that
replication slots are guaranteed to be able to keep files up to. It
is not the size that replication slot are guaranteed not to keep WAL
files beyond it. Addition to that, WAL removal happens only at the
ending of a checkpoint so WAL files can grow up to
max_slot_wal_keep_size plus checkpoint distance assuming an even load.
> -- Configuration --
> master & one replica - streaming replication using a slot
> ~700GB available for pg_wal
> max_slot_wal_keep_size = 600GB
> min_wal_size = 20GB
> max_wal_size = 40GB
> default checkpoint_timeout = 5 minutes (no problem with checkpoints)
> archiving is on and is catching up well
Assuming an even load (or WAL speed) and 0.5 for
checkpoint_completion_target, 40GB of max_wal_size causes checkpoints
every 27GB (1706 segments) (*1) at longest (in the case where xlog
checkpoint fires before timeout checkpoint).
Thus with 600GB of max_slot_wal_keep_size, the maximum size of WAL
files can reach 627GB, which size can even be exceeded if a sudden
high-load is given.
[1] checkpoint distance = max_wal_size / (1.0 + checkpoint_completion_target)
> -- What happened --
> Under heavy load (large COPY/INSERT transactions, loading hundreds of GB of
> data), the replication started falling behind. Available space on pg_wal was
> being reduced in the same rate as safe_slot
> pg_replication_slot.safe_wal_size - as expected. At some point safe_wal_size
> went negative and streaming stopped working. It wasn't a problem, because
> replica started recovery from WAL archive. I expected that once the slot is
> lost, WALs will be removed up to max_wal_size. This did not happen though.
> It seems that Postgres tried to maintain something close to
> max_slot_wal_keep_size (600GB) available, in case replica starts catching up
> again. Over the time, there was no single transaction which would require
> this much WAL to be kept. archiving wasn't behind either.
Useless WAL files will be removd after a checkpoint runs.
They should be, but they are not. That is the bug. They just hang around, checkpoint after checkpoint. Some of them do get cleaned up, to make up for new ones created during that cycle. It treats max_slot_wal_keep the same way it treats wal_keep_size (but only if a "lost" slot is hanging around). If you drop the lost slot, only then does it remove all the accumulated WAL at the next checkpoint.
Cheers,
Jeff
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Kyotaro Horiguchi
Date:
At Wed, 14 Jul 2021 19:10:26 -0400, Jeff Janes <jeff.janes@gmail.com> wrote in > On Tue, Jul 13, 2021 at 10:12 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> > wrote: > > Useless WAL files will be removd after a checkpoint runs. > > > > They should be, but they are not. That is the bug. They just hang > around, checkpoint after checkpoint. Some of them do get cleaned up, to > make up for new ones created during that cycle. It treats > max_slot_wal_keep the same way it treats wal_keep_size (but only if a > "lost" slot is hanging around). If you drop the lost slot, only then does > it remove all the accumulated WAL at the next checkpoint. Thanks! I saw the issue here. Some investigation showd me a doubious motion of XLogCtl->repliationSlotMinLSN. Slot invalidation is forgetting to recalculate it and that misbehavior retreats the segment horizon. So the attached worked for me. I'll repost the polished version including test. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index c7c928f50b..0fc0feb88e 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9301,6 +9301,15 @@ CreateCheckPoint(int flags) XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); KeepLogSeg(recptr, &_logSegNo); InvalidateObsoleteReplicationSlots(_logSegNo); + + /* + * Some slots may have been gone, recalculate the segments to keep based on + * the remaining slots. + */ + ReplicationSlotsComputeRequiredLSN(); + XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); + KeepLogSeg(recptr, &_logSegNo); + _logSegNo--; RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr); @@ -9641,6 +9650,15 @@ CreateRestartPoint(int flags) endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr; KeepLogSeg(endptr, &_logSegNo); InvalidateObsoleteReplicationSlots(_logSegNo); + + /* + * Some slots may have been gone, recalculate the segments to keep based on + * the remaining slots. + */ + ReplicationSlotsComputeRequiredLSN(); + XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); + KeepLogSeg(endptr, &_logSegNo); + _logSegNo--; /*
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Kyotaro Horiguchi
Date:
At Thu, 15 Jul 2021 14:22:35 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Wed, 14 Jul 2021 19:10:26 -0400, Jeff Janes <jeff.janes@gmail.com> wrote in > > They should be, but they are not. That is the bug. They just hang > > around, checkpoint after checkpoint. Some of them do get cleaned up, to > > make up for new ones created during that cycle. It treats > > max_slot_wal_keep the same way it treats wal_keep_size (but only if a > > "lost" slot is hanging around). If you drop the lost slot, only then does > > it remove all the accumulated WAL at the next checkpoint. > > Thanks! I saw the issue here. Some investigation showd me a doubious > motion of XLogCtl->repliationSlotMinLSN. Slot invalidation is > forgetting to recalculate it and that misbehavior retreats the segment > horizon. > > So the attached worked for me. I'll repost the polished version > including test. This is it. It is for the master branch but also applicable to 14 as is. Not needed for earlier version. I believe the test works for Windows but haven't checked. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 91705833be615690388e41d176c16b2c294bb0cf Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Thu, 15 Jul 2021 15:52:48 +0900 Subject: [PATCH v1] Advance old-segment horizon properly after slot invalidation When some slots are invalidated due to the limit of max_slot_wal_keep_size, the old segment horizon should advance to within the limit. Previously that advancement didn't happen. Hence especially if no valid slots remain after a slot-invalidation, the horizon no longer advances and the live WAL files no longer be less than max_slot_wal_keep_size thereafter. Backpatch to 14 where the feature was introduced. --- src/backend/access/transam/xlog.c | 24 +++++++++++++++++++++-- src/backend/replication/slot.c | 20 ++++++++++++++++--- src/include/replication/slot.h | 2 +- src/test/recovery/t/019_replslot_limit.pl | 11 ++++++++++- 4 files changed, 50 insertions(+), 7 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index c7c928f50b..6500ee5b11 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9300,7 +9300,17 @@ CreateCheckPoint(int flags) */ XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); KeepLogSeg(recptr, &_logSegNo); - InvalidateObsoleteReplicationSlots(_logSegNo); + if (InvalidateObsoleteReplicationSlots(_logSegNo)) + { + /* + * Some slots have been gone, recalculate the old-segment horizon + * excluding the invalidated slots. + */ + ReplicationSlotsComputeRequiredLSN(); + XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); + KeepLogSeg(recptr, &_logSegNo); + } + _logSegNo--; RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr); @@ -9640,7 +9650,17 @@ CreateRestartPoint(int flags) replayPtr = GetXLogReplayRecPtr(&replayTLI); endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr; KeepLogSeg(endptr, &_logSegNo); - InvalidateObsoleteReplicationSlots(_logSegNo); + if (InvalidateObsoleteReplicationSlots(_logSegNo)) + { + /* + * Some slots have been gone, recalculate the old-segment horizon + * excluding the invalidated slots. + */ + ReplicationSlotsComputeRequiredLSN(); + XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); + KeepLogSeg(endptr, &_logSegNo); + } + _logSegNo--; /* diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 33b85d86cc..b8eddcab53 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1143,11 +1143,14 @@ ReplicationSlotReserveWal(void) * Returns whether ReplicationSlotControlLock was released in the interim (and * in that case we're not holding the lock at return, otherwise we are). * + * Sets *invalidated if the slot was invalidated but never unsets otherwise. + * * This is inherently racy, because we release the LWLock * for syscalls, so caller must restart if we return true. */ static bool -InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN) +InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, + bool *invalidated) { int last_signaled_pid = 0; bool released_lock = false; @@ -1193,6 +1196,12 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN) slotname = s->data.name; active_pid = s->active_pid; + /* + * Inform the caller that this slot have got invalidated. We could just + * assign true here but make it clear what we are intending here. + */ + *invalidated |= true; + /* * If the slot can be acquired, do so and mark it invalidated * immediately. Otherwise we'll signal the owning process, below, and @@ -1291,12 +1300,15 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN) * Mark any slot that points to an LSN older than the given segment * as invalid; it requires WAL that's about to be removed. * + * Returns true when any slot have got invalidated. + * * NB - this runs as part of checkpoint, so avoid raising errors if possible. */ -void +bool InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno) { XLogRecPtr oldestLSN; + bool invalidated = false; XLogSegNoOffsetToRecPtr(oldestSegno, 0, wal_segment_size, oldestLSN); @@ -1309,13 +1321,15 @@ restart: if (!s->in_use) continue; - if (InvalidatePossiblyObsoleteSlot(s, oldestLSN)) + if (InvalidatePossiblyObsoleteSlot(s, oldestLSN, &invalidated)) { /* if the lock was released, start from scratch */ goto restart; } } LWLockRelease(ReplicationSlotControlLock); + + return invalidated; } /* diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h index 34d95eac8e..e32fb85db8 100644 --- a/src/include/replication/slot.h +++ b/src/include/replication/slot.h @@ -213,7 +213,7 @@ extern void ReplicationSlotsComputeRequiredLSN(void); extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void); extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive); extern void ReplicationSlotsDropDBSlots(Oid dboid); -extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno); +extern bool InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno); extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock); extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot); extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok); diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl index d4b9ff705f..b76ed0fdac 100644 --- a/src/test/recovery/t/019_replslot_limit.pl +++ b/src/test/recovery/t/019_replslot_limit.pl @@ -11,7 +11,7 @@ use TestLib; use PostgresNode; use File::Path qw(rmtree); -use Test::More tests => $TestLib::windows_os ? 14 : 18; +use Test::More tests => $TestLib::windows_os ? 15 : 19; use Time::HiRes qw(usleep); $ENV{PGDATABASE} = 'postgres'; @@ -192,6 +192,15 @@ $result = $node_primary->safe_psql('postgres', is($result, "rep1|f|t|lost|", 'check that the slot became inactive and the state "lost" persists'); +# The invalidated slot shouldn't retreat the old-segment horizon +my $redoseg = $node_primary->safe_psql('postgres', + "select pg_walfile_name(lsn) from pg_create_physical_replication_slot('xx', true)"); +my $oldestseg = $node_primary->safe_psql('postgres', + "select pg_ls_dir from pg_ls_dir('pg_wal') where pg_ls_dir like '0%' order by pg_ls_dir limit 1"); + +is($oldestseg, $redoseg, + "check if slot-invalidation advances the old segment horizon"); + # The standby no longer can connect to the primary $logstart = get_log_size($node_standby); $node_standby->start; -- 2.27.0
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Marcin Krupowicz
Date:
On Thu, 15 Jul 2021 at 08:33, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > This is it. It is for the master branch but also applicable to 14 as > is. Not needed for earlier version. > I believe the test works for Windows but haven't checked. Thanks for that. To clarify - is it not going to be fixed in 13.x? -- Marcin
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Alvaro Herrera
Date:
On 2021-Jul-15, Marcin Krupowicz wrote: > On Thu, 15 Jul 2021 at 08:33, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > This is it. It is for the master branch but also applicable to 14 as > > is. Not needed for earlier version. > > I believe the test works for Windows but haven't checked. Oh my, what an oversight. Thanks Kyotaro for the investigation and fix; I'll get it pushed today. > Thanks for that. To clarify - is it not going to be fixed in 13.x? It definitely is, since the feature is there. I'll fix the apply conflicts, since they're pretty trivial. (No need to send a new patch.) -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Ed is the standard text editor." http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Jeff Janes
Date:
On Thu, Jul 15, 2021 at 1:22 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Wed, 14 Jul 2021 19:10:26 -0400, Jeff Janes <jeff.janes@gmail.com> wrote in
> On Tue, Jul 13, 2021 at 10:12 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> wrote:
> > Useless WAL files will be removd after a checkpoint runs.
> >
>
> They should be, but they are not. That is the bug. They just hang
> around, checkpoint after checkpoint. Some of them do get cleaned up, to
> make up for new ones created during that cycle. It treats
> max_slot_wal_keep the same way it treats wal_keep_size (but only if a
> "lost" slot is hanging around). If you drop the lost slot, only then does
> it remove all the accumulated WAL at the next checkpoint.
Thanks! I saw the issue here. Some investigation showd me a doubious
motion of XLogCtl->repliationSlotMinLSN. Slot invalidation is
forgetting to recalculate it and that misbehavior retreats the segment
horizon.
So the attached worked for me. I'll repost the polished version
including test.
Thank you. That works for me. But I did not test on Windows.
* Some slots may have been gone,
"been invalidated" reads better than "been gone", and matches the wording used elsewhere.
Cheers,
Jeff
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Alvaro Herrera
Date:
On 2021-Jul-15, Kyotaro Horiguchi wrote: > This is it. It is for the master branch but also applicable to 14 as > is. Not needed for earlier version. > I believe the test works for Windows but haven't checked. I looked at it. I think it is better to make the calls to ReplicationSlotsComputeRequiredLSN() in slot.c (which is where most other calls to that function are). Also we should recompute the minimum required Xmin at that point. Another change I did was move the "*invalidated=true" assignment to the block where we actually invalidate the slot; in your patch you were doing it possibly too early if the slot was in use by some other process. (For end effect it probably doesn't matter much, but it's better to have it right.) -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Attachment
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Alvaro Herrera
Date:
On 2021-Jul-15, Kyotaro Horiguchi wrote: > > Thanks! I saw the issue here. Some investigation showd me a doubious > > motion of XLogCtl->repliationSlotMinLSN. Slot invalidation is > > forgetting to recalculate it and that misbehavior retreats the segment > > horizon. Actually, looking again, isn't this supposed to happen in KeepLogSeg()? We have a block that caps to max_slot_wal_keep_size_mb there ... why did that not work? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Sallah, I said NO camels! That's FIVE camels; can't you count?" (Indiana Jones)
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Alvaro Herrera
Date:
On 2021-Jul-15, Alvaro Herrera wrote: > Actually, looking again, isn't this supposed to happen in KeepLogSeg()? > We have a block that caps to max_slot_wal_keep_size_mb there ... why did > that not work? I find that this smaller patch is sufficient to make the added test case work. However, I'm not sure I understand *why* ... -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
Attachment
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Kyotaro Horiguchi
Date:
At Thu, 15 Jul 2021 17:33:20 -0400, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in > On 2021-Jul-15, Alvaro Herrera wrote: > > > Actually, looking again, isn't this supposed to happen in KeepLogSeg()? > > We have a block that caps to max_slot_wal_keep_size_mb there ... why did > > that not work? > > I find that this smaller patch is sufficient to make the added test case > work. However, I'm not sure I understand *why* ... It's because another checkpoint is running at the time the manual checkpoint just before is invoked. Two successive checkpoint hides the defect. The checkpoint works as the rendezvous point for the succeeding tests so I added another sync point instead of the manual checkpoint. The test in the attached correctly fails if _logSegNo were not updated by slot invalidation. By the way, about the previous version, XLByteToSeg is needed since KeepLogSeg doesn't advance _logSegNo, which is the wanted action here. The test in the attached fails if only KeepLogSeg is called again. I confirmed that *the previous* version of the test works for Windows. (So there's no reason that the current version doesn't work.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center From cd9ecce8655359a2bbb147f4d017de3e66a2bb00 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Thu, 15 Jul 2021 15:52:48 +0900 Subject: [PATCH v6] Advance old-segment horizon properly after slot invalidation When some slots are invalidated due to the max_slot_wal_keep_size limit, the old segment horizon should move forward to stay within the limit. However, in commit c6550776394e we forgot to call KeepLogSeg again to recompute the horizon after invalidating replication slots. In cases where other slots remained, the limits would be recomputed eventually for other reasons, but if all slots were invalidated, the limits would not move at all afterwards. Repair. Backpatch to 13 where the feature was introduced. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reported-by: Marcin Krupowicz <mk@071.ovh> Discussion: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org --- src/backend/access/transam/xlog.c | 24 +++++++++++++++-- src/backend/replication/slot.c | 26 +++++++++++++++--- src/include/replication/slot.h | 2 +- src/test/recovery/t/019_replslot_limit.pl | 33 ++++++++++++++++++----- 4 files changed, 73 insertions(+), 12 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index edb15fe58d..752b1099ee 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9300,7 +9300,17 @@ CreateCheckPoint(int flags) */ XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); KeepLogSeg(recptr, &_logSegNo); - InvalidateObsoleteReplicationSlots(_logSegNo); + if (InvalidateObsoleteReplicationSlots(_logSegNo)) + { + /* + * Some slots have been gone, recalculate the old-segment horizon + * excluding the invalidated slots. XLByteToSeg is needed here to + * advance _logSegNo. + */ + XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); + KeepLogSeg(recptr, &_logSegNo); + } + _logSegNo--; RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr); @@ -9640,7 +9650,17 @@ CreateRestartPoint(int flags) replayPtr = GetXLogReplayRecPtr(&replayTLI); endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr; KeepLogSeg(endptr, &_logSegNo); - InvalidateObsoleteReplicationSlots(_logSegNo); + if (InvalidateObsoleteReplicationSlots(_logSegNo)) + { + /* + * Some slots have been gone, recalculate the old-segment horizon + * excluding the invalidated slots. XLByteToSeg is needed here to + * advance _logSegNo. + */ + XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); + KeepLogSeg(endptr, &_logSegNo); + } + _logSegNo--; /* diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 33b85d86cc..33e9acab4a 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1143,11 +1143,14 @@ ReplicationSlotReserveWal(void) * Returns whether ReplicationSlotControlLock was released in the interim (and * in that case we're not holding the lock at return, otherwise we are). * + * Sets *invalidated true if the slot was invalidated. (Untouched otherwise.) + * * This is inherently racy, because we release the LWLock * for syscalls, so caller must restart if we return true. */ static bool -InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN) +InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, + bool *invalidated) { int last_signaled_pid = 0; bool released_lock = false; @@ -1204,6 +1207,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN) s->active_pid = MyProcPid; s->data.invalidated_at = restart_lsn; s->data.restart_lsn = InvalidXLogRecPtr; + + /* Let caller know */ + *invalidated = true; } SpinLockRelease(&s->mutex); @@ -1291,12 +1297,15 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN) * Mark any slot that points to an LSN older than the given segment * as invalid; it requires WAL that's about to be removed. * + * Returns true when any slot have got invalidated. + * * NB - this runs as part of checkpoint, so avoid raising errors if possible. */ -void +bool InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno) { XLogRecPtr oldestLSN; + bool invalidated = false; XLogSegNoOffsetToRecPtr(oldestSegno, 0, wal_segment_size, oldestLSN); @@ -1309,13 +1318,24 @@ restart: if (!s->in_use) continue; - if (InvalidatePossiblyObsoleteSlot(s, oldestLSN)) + if (InvalidatePossiblyObsoleteSlot(s, oldestLSN, &invalidated)) { /* if the lock was released, start from scratch */ goto restart; } } LWLockRelease(ReplicationSlotControlLock); + + /* + * If any slots have been invalidated, recalculate the resource limits. + */ + if (invalidated) + { + ReplicationSlotsComputeRequiredXmin(false); + ReplicationSlotsComputeRequiredLSN(); + } + + return invalidated; } /* diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h index 34d95eac8e..e32fb85db8 100644 --- a/src/include/replication/slot.h +++ b/src/include/replication/slot.h @@ -213,7 +213,7 @@ extern void ReplicationSlotsComputeRequiredLSN(void); extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void); extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive); extern void ReplicationSlotsDropDBSlots(Oid dboid); -extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno); +extern bool InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno); extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock); extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot); extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok); diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl index d4b9ff705f..bbdaaca14c 100644 --- a/src/test/recovery/t/019_replslot_limit.pl +++ b/src/test/recovery/t/019_replslot_limit.pl @@ -11,7 +11,7 @@ use TestLib; use PostgresNode; use File::Path qw(rmtree); -use Test::More tests => $TestLib::windows_os ? 14 : 18; +use Test::More tests => $TestLib::windows_os ? 15 : 19; use Time::HiRes qw(usleep); $ENV{PGDATABASE} = 'postgres'; @@ -176,7 +176,13 @@ ok( !find_in_log( # Advance WAL again, the slot loses the oldest segment. my $logstart = get_log_size($node_primary); advance_wal($node_primary, 7); -$node_primary->safe_psql('postgres', "CHECKPOINT;"); + +# This slot should be broken, wait for that to happen +$node_primary->poll_query_until( + 'postgres', + qq[ + SELECT wal_status = 'lost' FROM pg_replication_slots + WHERE slot_name = 'rep1']); # WARNING should be issued ok( find_in_log( @@ -185,13 +191,28 @@ ok( find_in_log( $logstart), 'check that the warning is logged'); -# This slot should be broken -$result = $node_primary->safe_psql('postgres', - "SELECT slot_name, active, restart_lsn IS NULL, wal_status, safe_wal_size FROM pg_replication_slots WHERE slot_name= 'rep1'" -); +$result = $node_primary->safe_psql( + 'postgres', + qq[ + SELECT slot_name, active, restart_lsn IS NULL, wal_status, safe_wal_size + FROM pg_replication_slots WHERE slot_name = 'rep1']); is($result, "rep1|f|t|lost|", 'check that the slot became inactive and the state "lost" persists'); +# The invalidated slot shouldn't keep the old-segment horizon back; +# see bug #17103: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org +# Test for this by creating a new slot and comparing its restart LSN +# to the oldest existing file. +my $redoseg = $node_primary->safe_psql('postgres', + "SELECT pg_walfile_name(lsn) FROM pg_create_physical_replication_slot('s2', true)" +); +my $oldestseg = $node_primary->safe_psql('postgres', + "SELECT pg_ls_dir AS f FROM pg_ls_dir('pg_wal') WHERE pg_ls_dir ~ '^[0-9A-F]{24}\$'ORDER BY 1 LIMIT 1" + ); +$node_primary->safe_psql('postgres', + qq[SELECT pg_drop_replication_slot('s2')]); +is($oldestseg, $redoseg, "check that segments have been removed"); + # The standby no longer can connect to the primary $logstart = get_log_size($node_standby); $node_standby->start; -- 2.27.0
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Alvaro Herrera
Date:
On 2021-Jul-16, Kyotaro Horiguchi wrote: > It's because another checkpoint is running at the time the manual > checkpoint just before is invoked. Two successive checkpoint hides > the defect. The checkpoint works as the rendezvous point for the > succeeding tests so I added another sync point instead of the manual > checkpoint. The test in the attached correctly fails if _logSegNo > were not updated by slot invalidation. OK, I thought as much and tried to catch it in the act but couldn't. Your explanation makes sense. > By the way, about the previous version, XLByteToSeg is needed since > KeepLogSeg doesn't advance _logSegNo, which is the wanted action > here. The test in the attached fails if only KeepLogSeg is called > again. Oh, true. > I confirmed that *the previous* version of the test works for Windows. > (So there's no reason that the current version doesn't work.) Great, thanks. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "Siempre hay que alimentar a los dioses, aunque la tierra esté seca" (Orual)
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Alvaro Herrera
Date:
Okay, I've now pushed it. Thanks! -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Alvaro Herrera
Date:
I was running tests for another patch, and this one failed once: Testing REL_14_BETA2-168-g1cb94c3c4c started: vie jul 16 14:02:11 -04 2021 # Failed test 'check that the warning is logged' # at t/019_replslot_limit.pl line 187. # Failed test 'check that the slot became inactive and the state "lost" persists' # at t/019_replslot_limit.pl line 198. # got: 'rep1|t|t|lost|' # expected: 'rep1|f|t|lost|' # Failed test 'check that segments have been removed' # at t/019_replslot_limit.pl line 213. # got: '000000010000000000000014' # expected: '00000001000000000000001C' # Looks like you failed 3 tests of 19. make[2]: *** [Makefile:23: check] Error 1 The buildfarm has remained green so far, but clearly this is something we need to fix. Maybe it's as simple as adding the loop we use below, starting at line 219. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Alvaro Herrera
Date:
On 2021-Jul-16, Alvaro Herrera wrote: > The buildfarm has remained green so far, but clearly this is something > we need to fix. Maybe it's as simple as adding the loop we use below, > starting at line 219. There are a few failures of this on buildfarm now, https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2021-07-16%2022%3A30%3A35 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2021-07-16%2021%3A52%3A04 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2021-07-16%2021%3A52%3A05 I am running the test in a loop with the attached; if it doesn't fail in a few more rounds I'll push it. There are two instances of a different failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-17%2013%3A39%3A43 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2021-07-16%2021%3A14%3A14 # Failed test 'check that segments have been removed' # at t/019_replslot_limit.pl line 213. # got: '000000010000000000000021' # expected: '000000010000000000000022' # Looks like you failed 1 test of 19. [23:55:14] t/019_replslot_limit.pl .............. Dubious, test returned 1 (wstat 256, 0x100) I'm afraid about this not being something we can fix with some additional wait points ... -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "La grandeza es una experiencia transitoria. Nunca es consistente. Depende en gran parte de la imaginación humana creadora de mitos" (Irulan)
Attachment
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Kyotaro Horiguchi
Date:
At Sat, 17 Jul 2021 10:28:09 -0400, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in > On 2021-Jul-16, Alvaro Herrera wrote: > > > The buildfarm has remained green so far, but clearly this is something > > we need to fix. Maybe it's as simple as adding the loop we use below, > > starting at line 219. > > There are a few failures of this on buildfarm now, .. > I am running the test in a loop with the attached; if it doesn't fail in > a few more rounds I'll push it. > > There are two instances of a different failure: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-17%2013%3A39%3A43 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2021-07-16%2021%3A14%3A14 > > # Failed test 'check that segments have been removed' > # at t/019_replslot_limit.pl line 213. > # got: '000000010000000000000021' > # expected: '000000010000000000000022' > # Looks like you failed 1 test of 19. > [23:55:14] t/019_replslot_limit.pl .............. > Dubious, test returned 1 (wstat 256, 0x100) > > I'm afraid about this not being something we can fix with some > additional wait points ... Sorry for the mistake. It seems to me the cause the above is that segment removal happens *after* invalidation. Since (at least currently) the "slot is invalidated" warning issued only at the time just before WAL-removal, we should expect that old segments are gone after the checkpoint-ending log, which should be seen after WAL-removal. If not, that shows that there's a bug. What do you think about the attached? regards. -- Kyotaro Horiguchi NTT Open Source Software Center From c52d7931e95cc24804f9aac4c9bf3a388c7e461f Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Mon, 19 Jul 2021 10:58:01 +0900 Subject: [PATCH v1] Remove possible instability of new replication slot test code The last fix for the same left another possible timing unstability between actual segment removal and the invalidation log. Make it steady by waiting for checkpoint-ending log, which is issued after the segment removal. --- src/test/recovery/t/019_replslot_limit.pl | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl index 026da02ff1..a5d8140807 100644 --- a/src/test/recovery/t/019_replslot_limit.pl +++ b/src/test/recovery/t/019_replslot_limit.pl @@ -11,7 +11,7 @@ use TestLib; use PostgresNode; use File::Path qw(rmtree); -use Test::More tests => $TestLib::windows_os ? 15 : 19; +use Test::More tests => $TestLib::windows_os ? 16 : 20; use Time::HiRes qw(usleep); $ENV{PGDATABASE} = 'postgres'; @@ -201,6 +201,19 @@ $result = $node_primary->safe_psql( is($result, "rep1|f|t|lost|", 'check that the slot became inactive and the state "lost" persists'); +# Make sure the current checkpoint ended +my $checkpoint_ended = 0; +for (my $i = 0; $i < 10000; $i++) +{ + if (find_in_log($node_primary, "checkpoint complete: ", $logstart)) + { + $checkpoint_ended = 1; + last; + } + usleep(100_000); +} +ok($checkpoint_ended, 'make sure checkpoint ended'); + # The invalidated slot shouldn't keep the old-segment horizon back; # see bug #17103: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org # Test for this by creating a new slot and comparing its restart LSN -- 2.27.0
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Alvaro Herrera
Date:
On 2021-Jul-19, Kyotaro Horiguchi wrote: > Sorry for the mistake. It seems to me the cause the above is that > segment removal happens *after* invalidation. Since (at least > currently) the "slot is invalidated" warning issued only at the time > just before WAL-removal, we should expect that old segments are gone > after the checkpoint-ending log, which should be seen after > WAL-removal. If not, that shows that there's a bug. > > What do you think about the attached? Sounds sensible -- I verified the logs for one of the cases that failed in the buildfarm, and indeed the "checkpoint ended" message appears after the s2 slot is created, so it should fix the problem. (I didn't actually try to reproduce the problem locally, so I didn't verify the fix any further than ensuring the test still passes.) Pushed, thanks! -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Kyotaro Horiguchi
Date:
At Mon, 19 Jul 2021 17:24:48 -0400, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in > On 2021-Jul-19, Kyotaro Horiguchi wrote: > > > Sorry for the mistake. It seems to me the cause the above is that > > segment removal happens *after* invalidation. Since (at least > > currently) the "slot is invalidated" warning issued only at the time > > just before WAL-removal, we should expect that old segments are gone > > after the checkpoint-ending log, which should be seen after > > WAL-removal. If not, that shows that there's a bug. > > > > What do you think about the attached? > > Sounds sensible -- I verified the logs for one of the cases that failed > in the buildfarm, and indeed the "checkpoint ended" message appears > after the s2 slot is created, so it should fix the problem. (I didn't > actually try to reproduce the problem locally, so I didn't verify the > fix any further than ensuring the test still passes.) > > Pushed, thanks! Thaks, and sorry for the series of bugs. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2021-Jul-19, Kyotaro Horiguchi wrote: >> What do you think about the attached? > Sounds sensible -- I verified the logs for one of the cases that failed > in the buildfarm, and indeed the "checkpoint ended" message appears > after the s2 slot is created, so it should fix the problem. (I didn't > actually try to reproduce the problem locally, so I didn't verify the > fix any further than ensuring the test still passes.) This test is still unstable :-( https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-07-20%2012%3A46%3A11 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-20%2015%3A05%3A39 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01 These all look like # Failed test 'check that segments have been removed' # at t/019_replslot_limit.pl line 226. # got: '000000010000000000000020' # expected: '000000010000000000000024' # Looks like you failed 1 test of 16. with varying values mentioned. It looks to me like WAL file cleanup is not as synchronous with slot creation as the test thinks. Maybe it needs to loop until the oldest WAL file matches what it expects? regards, tom lane
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Kyotaro Horiguchi
Date:
At Wed, 28 Jul 2021 11:38:28 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > This test is still unstable :-( > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-07-20%2012%3A46%3A11 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-20%2015%3A05%3A39 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01 > > These all look like > > # Failed test 'check that segments have been removed' > # at t/019_replslot_limit.pl line 226. > # got: '000000010000000000000020' > # expected: '000000010000000000000024' > # Looks like you failed 1 test of 16. > > with varying values mentioned. It looks to me like WAL file cleanup > is not as synchronous with slot creation as the test thinks. > Maybe it needs to loop until the oldest WAL file matches what it > expects? Sorry for the kludge. Mmm. In the failure cases, directory scan(@16:52:22.036) runs before the targetted checkpoint completes(@16:52:22.144). https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01 16:52:17.328 LOG: checkpoint starting: wal 16:52:19.140 LOG: invalidating slot "rep1" because its restart_lsn 0/1D00000 exceeds max_slot_wal_keep_size 16:52:19.316 019_replslot_limit.pl LOG: statement: SELECT pg_walfile_name(lsn) FROM pg_create_physical_replication_slot('s2',true) 16:52:22.036 019_replslot_limit.pl LOG: statement: SELECT pg_ls_dir AS f FROM pg_ls_dir('pg_wal') WHERE pg_ls_dir ~ '^[0-9A-F]{24}$'ORDER BY 1 LIMIT 16:52:22.077 019_replslot_limit.pl LOG: statement: SELECT pg_drop_replication_slot('s2') 16:52:22.144 LOG: checkpoint complete: wrote 18 buffers (14.1%); 0 WAL file(s) added, 4 removed, 3 recycled; write=1.806s, sync=0.001 s, total=4.817 s; sync files=0, longest=0.000 s, average=0.000 s; distance=3072 kB, estimate=3072kB The reason is the previous checkpoint completed after starting to advance segments > my $logstart = get_log_size($node_primary); > advance_wal($node_primary, 7); !!!! another checkpoint runs/ends ... 16:52:19.140 # check for "invalidate slots" in log. # check for the "lost" state in pg_replication_slots. !! checkfor "checkpint complete" in log. 16:52:22.077 # read redo segment and oldest wal. 16:52:22.144 !! The target checkpoint ends. So it seems to me we need to explicitly prevent unexpected checkpoints from happening maybe by enlarging max_wal_size temporily. I'll going that way. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Kyotaro Horiguchi
Date:
At Thu, 29 Jul 2021 16:20:38 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > So it seems to me we need to explicitly prevent unexpected checkpoints > from happening maybe by enlarging max_wal_size temporily. > > I'll going that way. I ended up with the attached. It causes a checkpoint reliably exactly at the aimed timing without. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From faca25782953d208bcf7b1e4ea0d62a40be3c7f1 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Thu, 29 Jul 2021 16:36:32 +0900 Subject: [PATCH] Make slot test more stable The slot test still instable in the case where test script runs relatively slower than checkpointing. Remove the instability by causing a checkpoint exactly at the aimed timing. --- src/test/recovery/t/019_replslot_limit.pl | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl index 54ddcef2bb..ee4801b49c 100644 --- a/src/test/recovery/t/019_replslot_limit.pl +++ b/src/test/recovery/t/019_replslot_limit.pl @@ -173,10 +173,20 @@ ok( !find_in_log( "requested WAL segment [0-9A-F]+ has already been removed"), 'check that required WAL segments are still available'); -# Advance WAL again, the slot loses the oldest segment. +# prevent checkpoints from occurring while advancing WAL segments +$node_primary->safe_psql('postgres', "CHECKPOINT;"); +$node_primary->safe_psql('postgres', + "ALTER SYSTEM SET max_wal_size='40MB'; SELECT pg_reload_conf()"); + +# Advance WAL again, the slot loses the oldest segment by the coming checkpoint. my $logstart = get_log_size($node_primary); advance_wal($node_primary, 7); +# Cause a checkpoint +$node_primary->safe_psql('postgres', + 'ALTER SYSTEM RESET max_wal_size; SELECT pg_reload_conf()'); +$node_primary->safe_psql('postgres', "CHECKPOINT;"); + # wait until the WARNING is issued my $invalidated = 0; for (my $i = 0; $i < 10000; $i++) -- 2.27.0
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Kyotaro Horiguchi
Date:
(Somehow the previous mail was broken in many ways. I re-send it.) At Thu, 29 Jul 2021 16:20:38 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > So it seems to me we need to explicitly prevent unexpected checkpoints > from happening maybe by enlarging max_wal_size temporily. > > I'll going that way. I ended up with the attached. It causes a checkpoint reliably exactly at the aimed timing without. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 5c6fa9cd8a6449931de339c628a3a81367f7b22a Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Thu, 29 Jul 2021 16:36:32 +0900 Subject: [PATCH] Make slot test more stable The slot test is still instable in the case where the test script runs at relatively slower speed than checkpointing. Remove the instability by causing a checkpoint exactly at the aimed timing. --- src/test/recovery/t/019_replslot_limit.pl | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl index 54ddcef2bb..ee4801b49c 100644 --- a/src/test/recovery/t/019_replslot_limit.pl +++ b/src/test/recovery/t/019_replslot_limit.pl @@ -173,10 +173,20 @@ ok( !find_in_log( "requested WAL segment [0-9A-F]+ has already been removed"), 'check that required WAL segments are still available'); -# Advance WAL again, the slot loses the oldest segment. +# prevent checkpoints from occurring while advancing WAL segments +$node_primary->safe_psql('postgres', "CHECKPOINT;"); +$node_primary->safe_psql('postgres', + "ALTER SYSTEM SET max_wal_size='40MB'; SELECT pg_reload_conf()"); + +# Advance WAL again, the slot loses the oldest segment by the coming checkpoint. my $logstart = get_log_size($node_primary); advance_wal($node_primary, 7); +# Cause a checkpoint +$node_primary->safe_psql('postgres', + 'ALTER SYSTEM RESET max_wal_size; SELECT pg_reload_conf()'); +$node_primary->safe_psql('postgres', "CHECKPOINT;"); + # wait until the WARNING is issued my $invalidated = 0; for (my $i = 0; $i < 10000; $i++) -- 2.27.0
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
From
Kyotaro Horiguchi
Date:
At Thu, 29 Jul 2021 17:19:24 -0400, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in > On 2021-Jul-29, Kyotaro Horiguchi wrote: > > > At Thu, 29 Jul 2021 16:20:38 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > > So it seems to me we need to explicitly prevent unexpected checkpoints > > > from happening maybe by enlarging max_wal_size temporily. > > > > I ended up with the attached. It causes a checkpoint reliably exactly > > at the aimed timing without. > > Thanks for looking into it. The explanation makes sense, so I pushed > your patch and also fixed one outdated comment I noticed while reading > the related code. I hope this commit will solve the problem ... it is > quite low probability, so we'll have to wait at least two weeks to make > any conclusion. Agreed. I hope this is the last failure.. Thanks. -- Kyotaro Horiguchi NTT Open Source Software Center