Re: 001_rep_changes.pl fails due to publisher stuck on shutdown - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
Date
Msg-id CAA4eK1K9RVRLin6aca2D1wvoobsyJt2xCzoidgMs=pGzSG_WxA@mail.gmail.com
Whole thread Raw
In response to Re: 001_rep_changes.pl fails due to publisher stuck on shutdown  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
List pgsql-hackers
On Wed, Jun 12, 2024 at 6:43 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Tue, 11 Jun 2024 14:27:28 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > On Tue, Jun 11, 2024 at 12:34 PM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > >
> > > At Tue, 11 Jun 2024 11:32:12 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > > > Sorry, it is not clear to me why we failed to flush the last
> > > > continuation record in logical walsender? I see that we try to flush
> > > > the WAL after receiving got_STOPPING in WalSndWaitForWal(), why is
> > > > that not sufficient?
> > >
> > > It seems that, it uses XLogBackgroundFlush(), which does not guarantee
> > > flushing WAL until the end.
> > >
> >
> > What would it take to ensure the same? I am trying to explore this
> > path because currently logical WALSender sends any outstanding logs up
> > to the shutdown checkpoint record (i.e., the latest record) and waits
> > for them to be replicated to the standby before exit. Please take a
> > look at the comments where we call WalSndDone(). The fix you are
> > proposing will break that guarantee.
>
> Shutdown checkpoint is performed after the walsender completed
> termination since 086221cf6b,
>

Yeah, but the commit you quoted later reverted by commit 703f148e98
and committed again as c6c3334364.

> aiming to prevent walsenders from
> generating competing WAL (by, for example, CREATE_REPLICATION_SLOT)
> records with the shutdown checkpoint. Thus, it seems that the
> walsender cannot see the shutdown record,
>

This is true of logical walsender. The physical walsender do send
shutdown checkpoint record before getting terminated.

> and a certain amount of
> bytes before it, as the walsender appears to have relied on the
> checkpoint flushing its record, rather than on XLogBackgroundFlush().
>
> If we approve of the walsender being terminated before the shutdown
> checkpoint, we need to "fix" the comment, then provide a function to
> ensure the synchronization of WAL records.
>

Which comment do you want to fix?

> I'll consider this direction for a while.
>

Okay, thanks.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Andrei Lepikhov
Date:
Subject: Re: Removing unneeded self joins
Next
From: Alexander Lakhin
Date:
Subject: Re: Remove dependence on integer wrapping