Thread: Infinite loop on master shutdown

Infinite loop on master shutdown

From

Kyotaro HORIGUCHI

Date:

17 May 2018, 11:19:00

Hello, as in pgsql-bug ML.

https://www.postgresql.org/message-id/20180517.170021.24356216.horiguchi.kyotaro@lab.ntt.co.jp

Master can go into infinite loop on shutdown. But it is caused by
a broken database like storage rolled-back one. (The steps to
replay this is shown in the above mail.)

I think this can be avoided by rejecting a standby if it reports
that write LSN is smaller than flush LSN after catching up.

Is it worth fixing?

# The patch is slightly different from that I posted to -bugs.

It is enough to chek for the invalid state just once but the
patch continues the check.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index e47ddca6bc..1916acf629 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1809,6 +1809,19 @@ ProcessStandbyReplyMessage(void)
     if (replyRequested)
         WalSndKeepalive(false);
 
+    /*
+     * Once this stream catches up to WAL, writePtr cannot be smaller than
+     * flushPtr. Otherwise we haven't reached the standby's starting LSN thus
+     * this database cannot be a proper master of the standby. The state
+     * causes infinite loop on shutdown.
+     */
+    if (MyWalSnd->state >= WALSNDSTATE_CATCHUP &&
+        writePtr != InvalidXLogRecPtr && writePtr < flushPtr)
+        ereport(ERROR,
+                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                 errmsg("Standby started from the future LSN for this server"),
+                 errhint("This means that the standby is not created from this database.")));
+
     /*
      * Update shared state for this WalSender process based on reply data from
      * standby.

Re: Infinite loop on master shutdown

From

Andres Freund

Date:

17 May 2018, 19:20:01

Hi,

On 2018-05-17 17:19:00 +0900, Kyotaro HORIGUCHI wrote:
> Hello, as in pgsql-bug ML.
> 
> https://www.postgresql.org/message-id/20180517.170021.24356216.horiguchi.kyotaro@lab.ntt.co.jp
> 
> Master can go into infinite loop on shutdown. But it is caused by
> a broken database like storage rolled-back one. (The steps to
> replay this is shown in the above mail.)
> 
> I think this can be avoided by rejecting a standby if it reports
> that write LSN is smaller than flush LSN after catching up.
> 
> Is it worth fixing?

I'm very doubtful. If you do bad stuff to a standby, bad things can
happen...

Greetings,

Andres Freund

Re: Infinite loop on master shutdown

From

Kyotaro HORIGUCHI

Date:

18 May 2018, 03:35:03

At Thu, 17 May 2018 09:20:01 -0700, Andres Freund <andres@anarazel.de> wrote in
<20180517162001.rzd7l6g2h66hvzvd@alap3.anarazel.de>
> Hi,
> 
> On 2018-05-17 17:19:00 +0900, Kyotaro HORIGUCHI wrote:
> > Hello, as in pgsql-bug ML.
> > 
> > https://www.postgresql.org/message-id/20180517.170021.24356216.horiguchi.kyotaro@lab.ntt.co.jp
> > 
> > Master can go into infinite loop on shutdown. But it is caused by
> > a broken database like storage rolled-back one. (The steps to
> > replay this is shown in the above mail.)
> > 
> > I think this can be avoided by rejecting a standby if it reports
> > that write LSN is smaller than flush LSN after catching up.
> > 
> > Is it worth fixing?
> 
> I'm very doubtful. If you do bad stuff to a standby, bad things can
> happen...

Yes, I doubted its worthiness since I didn't find more natural
way to cause that.

Thanks for the opinion.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center