Re: Exit walsender before confirming remote flush in logical replication - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Exit walsender before confirming remote flush in logical replication
Date
Msg-id CAExHW5v6Q4SFsqku2V3UHyv_pbdaX0Lt-uzb_L72CJaY4r6wvg@mail.gmail.com
Whole thread Raw
In response to Exit walsender before confirming remote flush in logical replication  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses Re: Exit walsender before confirming remote flush in logical replication
List pgsql-hackers
On Thu, Dec 22, 2022 at 11:16 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
> (I added Amit as CC because we discussed in another thread)
>
> This is a fork thread from time-delayed logical replication [1].
> While discussing, we thought that we could extend the condition of walsender shutdown[2][3].
>
> Currently, walsenders delay the shutdown request until confirming all sent data
> are flushed on remote side. This condition was added in 985bd7[4], which is for
> supporting clean switchover. Supposing that there is a primary-secondary
> physical replication system, and do following steps. If any changes are come
> while step 2 but the walsender does not confirm the remote flush, the reboot in
> step 3 may be failed.
>
> 1. Stops primary server.
> 2. Promotes secondary to new primary.
> 3. Reboot (old)primary as new secondary.
>
> In case of logical replication, however, we cannot support the use-case that
> switches the role publisher <-> subscriber. Suppose same case as above, additional
> transactions are committed while doing step2. To catch up such changes subscriber
> must receive WALs related with trans, but it cannot be done because subscriber
> cannot request WALs from the specific position. In the case, we must truncate all
> data in new subscriber once, and then create new subscription with copy_data
> = true.
>
> Therefore, I think that we can ignore the condition for shutting down the
> walsender in logical replication.
>
> This change may be useful for time-delayed logical replication. The walsender
> waits the shutdown until all changes are applied on subscriber, even if it is
> delayed. This causes that publisher cannot be stopped if large delay-time is
> specified.

I think the current behaviour is an artifact of using the same WAL
sender code for both logical and physical replication.

I agree with you that the logical WAL sender need not wait for all the
WAL to be replayed downstream.

I have not reviewed the patch though.

-- 
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures
Next
From: Andrey Lepikhov
Date:
Subject: Re: Optimization issue of branching UNION ALL