Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Introduce XID age and inactive timeout based replication slot invalidation
Date
Msg-id CAA4eK1JyGE=zi58zpt_xq_vkobpu8n1TOVKVJ6XJ+0+gisvU9g@mail.gmail.com
Whole thread Raw
In response to Re: Introduce XID age and inactive timeout based replication slot invalidation  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Introduce XID age and inactive timeout based replication slot invalidation
List pgsql-hackers
On Mon, Sep 16, 2024 at 3:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Please find the attached v46 patch having changes for the above review
> comments and your test review comments and Shveta's review comments.
>

-ReplicationSlotAcquire(const char *name, bool nowait)
+ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid)
 {
  ReplicationSlot *s;
  int active_pid;
@@ -615,6 +620,22 @@ retry:
  /* We made this slot active, so it's ours now. */
  MyReplicationSlot = s;

+ /*
+ * An error is raised if error_if_invalid is true and the slot has been
+ * previously invalidated due to inactive timeout.
+ */
+ if (error_if_invalid &&
+ s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT)
+ {
+ Assert(s->inactive_since > 0);
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("can no longer get changes from replication slot \"%s\"",
+ NameStr(s->data.name)),
+ errdetail("This slot has been invalidated because it was inactive
for longer than the amount of time specified by \"%s\".",
+    "replication_slot_inactive_timeout")));
+ }

Why raise the ERROR just for timeout invalidation here and why not if
the slot is invalidated for other reasons? This raises the question of
what happens before this patch if the invalid slot is used from places
where we call ReplicationSlotAcquire(). I did a brief code analysis
and found that for StartLogicalReplication(), even if the error won't
occur in ReplicationSlotAcquire(), it would have been caught in
CreateDecodingContext(). I think that is where we should also add this
new error. Similarly, pg_logical_slot_get_changes_guts() and other
logical replication functions should be calling
CreateDecodingContext() which can raise the new ERROR. I am not sure
about how the invalid slots are handled during physical replication,
please check the behavior of that before this patch.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN
Next
From: Tatsuo Ishii
Date:
Subject: Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN