Re: Disallow changing slot's failover option in transaction block - Mailing list pgsql-hackers

From shveta malik
Subject Re: Disallow changing slot's failover option in transaction block
Date
Msg-id CAJpy0uCCadKHEOT5cRT+EKwcmGuY2xxktaRgiNcS_RnFUcGYXQ@mail.gmail.com
Whole thread Raw
In response to RE: Disallow changing slot's failover option in transaction block  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses Re: Disallow changing slot's failover option in transaction block
RE: Disallow changing slot's failover option in transaction block
List pgsql-hackers
On Tue, Apr 16, 2024 at 1:45 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Hou,
>
> > Kuroda-San reported an issue off-list that:
> >
> > If user execute ALTER SUBSCRIPTION SET (failover) command inside a txn block
> > and rollback, only the subscription option change can be rolled back, while the
> > replication slot's failover change is preserved.
> >
> > This is because ALTER SUBSCRIPTION SET (failover) command internally
> > executes
> > the replication command ALTER_REPLICATION_SLOT to change the replication
> > slot's
> > failover property, but this replication command execution cannot be
> > rollback.
> >
> > To fix it, I think we can prevent user from executing ALTER SUBSCRIPTION set
> > (failover) inside a txn block, which is also consistent to the ALTER
> > SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach a small
> > patch to address this.
>
> Thanks for posting the patch, the fix is same as my expectation.
> Also, should we add the restriction to the doc? I feel [1] can be updated.

+1.

Similar to ALTER SUB, CREATE SUB also needs to be fixed. This is
because we call alter_replication_slot in CREATE SUB as well, for the
case when slot_name is provided and create_slot=false. But the tricky
part is we call alter_replication_slot() when creating subscription
for both failover=true and false. That means if we want to fix it on
the similar line of ALTER SUB, we have to disallow user from executing
the CREATE SUBSCRIPTION (slot_name = xx) in a txn block, which seems
to break some existing use cases. (previously, user can execute such a
command inside a txn block).

So, we need to think if there are better ways to fix it.  After
discussion with Hou-San offlist, here are some ideas:

1. do not alter replication slot's failover option when CREATE
SUBSCRIPTION   WITH failover=false. This means we alter replication
slot only when failover is set to true. And thus we can disallow
CREATE SUB WITH (slot_name =xx, failover=true, create_slot=false)
inside a txn block.

This option allows user to run CREATE-SUB(create_slot=false) with
failover=false in txn block like earlier. But on the downside, it
makes the behavior inconsistent for otherwise simpler option like
failover,  i.e. with failover=true, CREATE SUB is not allowed in txn
block while with failover=false, it is allowed. It makes it slightly
complex to be understood by user.

2. let's not disallow CREATE SUB in txn block as earlier, just don't
perform internal alter-failover during CREATE SUB for existing
slots(create_slot=false, slot_name=xx)  i.e. when create_slot is
false, we will ignore failover parameter of CREATE SUB and it is
user's responsibility to set it appropriately using ALTER SUB cmd. For
create_slot=true, the behaviour of CREATE-SUB remains same as earlier.

This option does not add new restriction for CREATE SUB wrt txn block.
In context of failover with create_slot=false, we already have a
similar restriction (documented one) for ALTER SUB, i.e. with 'ALTER
SUBSCRIPTION SET(slot_name = new)', user needs to alter the new slot's
failover by himself. CREAT SUB can also be documented in similar way.
This seems simpler to be understood considering existing ALTER SUB's
behavior as well. Plus, this will make CREATE-SUB code slightly
simpler and thus easily manageable.

3. add a alter_slot option for CREATE SUBSCRIPTION, we can only alter
the  slot's failover if alter_slot=true. And so we can disallow CREATE
SUB WITH (slot_name =xx, alter_slot=true) inside a txn block.

This seems a clean way, as everything will be as per user's consent
based on alter_slot parameter. But on the downside, this will need
introducing additional parameter and also adding new restriction of
running CREATE-sub in txn  block for a specific case.

4. Don't alter replication in subscription DDLs. Instead, try to alter
replication slot's failover in the apply worker. This means we need to
execute alter_replication_slot each time before starting streaming in
the apply worker.

This does not seem appealing to execute alter_replication_slot
everytime the apply worker starts. But if others think it as a better
option, it can be further analyzed.


Currently, our preference is option 2, as that looks a clean solution
and also aligns with ALTER-SUB behavior which is already documented.
Thoughts?

--------------------------------
Note that we could not refer to the design of two_phase here, because
two_phase can be considered as a streaming option, so it's fine to
change the two_phase along with START_REPLICATION command. (the
two_phase is not changed in subscription DDLs, but get changed in
START_REPLICATION command). But the failover is closely related to a
replication slot itself.
--------------------------------


Thanks
Shveta



pgsql-hackers by date:

Previous
From: Nazir Bilal Yavuz
Date:
Subject: Use read streams in CREATE DATABASE command when the strategy is wal_log
Next
From: Pavel Borisov
Date:
Subject: Re: Table AM Interface Enhancements