Re: 001_rep_changes.pl fails due to publisher stuck on shutdown - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
Date
Msg-id 20240612.101327.1997110414413074864.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: 001_rep_changes.pl fails due to publisher stuck on shutdown  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
List pgsql-hackers
At Tue, 11 Jun 2024 14:27:28 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in 
> On Tue, Jun 11, 2024 at 12:34 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > At Tue, 11 Jun 2024 11:32:12 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > > Sorry, it is not clear to me why we failed to flush the last
> > > continuation record in logical walsender? I see that we try to flush
> > > the WAL after receiving got_STOPPING in WalSndWaitForWal(), why is
> > > that not sufficient?
> >
> > It seems that, it uses XLogBackgroundFlush(), which does not guarantee
> > flushing WAL until the end.
> >
> 
> What would it take to ensure the same? I am trying to explore this
> path because currently logical WALSender sends any outstanding logs up
> to the shutdown checkpoint record (i.e., the latest record) and waits
> for them to be replicated to the standby before exit. Please take a
> look at the comments where we call WalSndDone(). The fix you are
> proposing will break that guarantee.

Shutdown checkpoint is performed after the walsender completed
termination since 086221cf6b, aiming to prevent walsenders from
generating competing WAL (by, for example, CREATE_REPLICATION_SLOT)
records with the shutdown checkpoint. Thus, it seems that the
walsender cannot see the shutdown record, and a certain amount of
bytes before it, as the walsender appears to have relied on the
checkpoint flushing its record, rather than on XLogBackgroundFlush().

If we approve of the walsender being terminated before the shutdown
checkpoint, we need to "fix" the comment, then provide a function to
ensure the synchronization of WAL records.

I'll consider this direction for a while.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

Previous
From: Joseph Koshakow
Date:
Subject: Re: Remove dependence on integer wrapping
Next
From: "Erica Zhang"
Date:
Subject: Re:Re: Re: Add support to TLS 1.3 cipher suites and curves lists