Re: [HACKERS] Restricting maximum keep segments by repslots - Mailing list pgsql-hackers
| From | Kyotaro Horiguchi |
|---|---|
| Subject | Re: [HACKERS] Restricting maximum keep segments by repslots |
| Date | |
| Msg-id | 20200408.141956.891237856186513376.horikyota.ntt@gmail.com Whole thread Raw |
| In response to | Re: [HACKERS] Restricting maximum keep segments by repslots (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
| Responses |
Re: [HACKERS] Restricting maximum keep segments by repslots
Re: [HACKERS] Restricting maximum keep segments by repslots Re: [HACKERS] Restricting maximum keep segments by repslots |
| List | pgsql-hackers |
At Wed, 08 Apr 2020 09:37:10 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > I pushed version 26, with a few further adjustments.
> >
> > I think what we have now is sufficient, but if you want to attempt this
> > "invalidated" flag on top of what I pushed, be my guest.
>
> I don't think the invalidation flag is essential but it can prevent
> unanticipated behavior, in other words, it makes us feel at ease:p
>
> After the current master/HEAD, the following steps causes assertion
> failure in xlogreader.c.
..
> I will look at it.
Just avoiding starting replication when restart_lsn is invalid is
sufficient (the attached, which is equivalent to a part of what the
invalidated flag did). I thing that the error message needs a Hint but
it looks on the subscriber side as:
[22086] 2020-04-08 10:35:04.188 JST ERROR: could not receive data from WAL stream: ERROR: replication slot "s1" is
invalidated
HINT: The slot exceeds the limit by max_slot_wal_keep_size.
I don't think it is not clean.. Perhaps the subscriber should remove
the trailing line of the message from the publisher?
> On the other hand, physical replication doesn't break by invlidation.
>
> Primary: postgres.conf
> max_slot_wal_keep_size=0
> Standby: postgres.conf
> primary_conninfo='connect to master'
> primary_slot_name='x1'
>
> (start the primary)
> P=> select pg_create_physical_replication_slot('x1');
> (start the standby)
> S=> create table tt(); drop table tt; select pg_switch_wal(); checkpoint;
If we don't mind that standby can reconnect after a walsender
termination due to the invalidation, we don't need to do something for
this. Restricting max_slot_wal_keep_size to be larger than a certain
threshold would reduce the chance we see that behavior.
I saw another issue, the following sequence on the primary freezes
when invalidation happens.
=# create table tt(); drop table tt; select pg_switch_wal();create table tt(); drop table tt; select
pg_switch_wal();createtable tt(); drop table tt; select pg_switch_wal(); checkpoint;
The last checkpoint command is waiting for CV on
CheckpointerShmem->start_cv in RequestCheckpoint(), while Checkpointer
is waiting for the next latch at the end of
CheckpointerMain. new_started doesn't move but it is the same value
with old_started.
That freeze didn't happen when I removed
ConditionVariableSleep(&s->active_cv) in
InvalidateObsoleteReplicationSlots.
I continue investigating it.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From b3f7e2d94b8ea9b5f3819fcf47c0e1ba57355b87 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Wed, 8 Apr 2020 14:03:01 +0900
Subject: [PATCH] walsender crash fix
---
src/backend/replication/walsender.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 06e8b79036..707de65f4b 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1170,6 +1170,13 @@ StartLogicalReplication(StartReplicationCmd *cmd)
pq_flush();
/* Start reading WAL from the oldest required WAL. */
+ if (MyReplicationSlot->data.restart_lsn == InvalidXLogRecPtr)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("replication slot \"%s\" is invalidated",
+ cmd->slotname),
+ errhint("The slot exceeds the limit by max_slot_wal_keep_size.")));
+
XLogBeginRead(logical_decoding_ctx->reader,
MyReplicationSlot->data.restart_lsn);
--
2.18.2
pgsql-hackers by date: