Re: Walsender may fail to send wal to the end. - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Walsender may fail to send wal to the end.
Date
Msg-id 20210330.154205.1619318594309963027.horikyota.ntt@gmail.com
Whole thread Raw
In response to Walsender may fail to send wal to the end.  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
At Mon, 29 Mar 2021 11:41:32 -0400, Stephen Frost <sfrost@snowman.net> wrote in 
> Greetings,
> 
> * Kyotaro Horiguchi (horikyota.ntt@gmail.com) wrote:
> > At Mon, 29 Mar 2021 14:47:33 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> > > On Fri, Mar 26, 2021 at 10:16:40AM -0700, Andres Freund wrote:
> > > > On 2021-03-26 18:20:14 +0900, Kyotaro Horiguchi wrote:
> > > > > This is because XLogSendPhysical detects removal of the wal segment
> > > > > currently reading by shutdown checkpoint.  However, there' no fear of
> > > > > overwriting of WAL segments at the time.
> > > > >
> > > > > So I think we can omit the call to CheckXLogRemoved() while
> > > > > MyWalSnd->state is WALSNDSTTE_STOPPING because the state comes after
> > > > > the shutdown checkpoint completes.
> > > > >
> > > > > Of course that doesn't help if walsender was running two segments
> > > > > behind. There still could be a small window for the failure.  But it's
> > > > > a great help to save the case of just 1 segment behind.
> > > > 
> > > > -1. This seems like a bandaid to make a broken configuration work a tiny
> > > > bit better, without actually being meaningfully better.
> > > 
> > > Agreed.  Still, wouldn't it be better to avoid such configurations and
> > > protect a bit things with a check on the new value?
> 
> I have a hard time agreeing that this is somehow a 'broken'
> configuration, instead it looks like a race condition that wasn't
> considered and should be addressed.  If there's zero lag then we really
> should allow the final WAL to get sent to the replica.

My unstated point was switching primary/secondary roles in a
replication set where both host have separate archives, by the steps
"fast shutdown primary"->"promote standby"->"attach the old primary as
new standby", wihtout a need of synchronizing old primary's archive to
that of the new standby before starting the new standby. I thought
that should work even if wal_keep_size = 0.

> > The repro was a bit artificial but the symptom happened without
> > pg_switch_wal() and no load.  It caused just by shutting down of
> > primary.  If it is normal behavior for walsenders to fail to send the
> > last shutdown record to standby while fast shutdown, we should refuse
> > to startup at least wal sender if wal_keep_size = 0.
> > 
> > I can guess two ways to do that.
> 
> Both of which will break things for people, so this certainly isn't a
> great approach, and besides, if archiving is happening with
> archive_command and the replica has a restore command then it should be

Right. 

> able to follow that just fine, no?  So we'd have to also check if
> archive_command has been set up and hope the admin has a restore

Yeah, that sounds stupid (or kind of impossible).

> command.  Having to go through that dance instead of just making sure to
> push out the last WAL to the replica seems a bit silly though.

Sounds reasonable to me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: James Hilliard
Date:
Subject: Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.
Next
From: Michael Paquier
Date:
Subject: Re: Refactor SSL test framework to support multiple TLS libraries