Re: Inconsistent DB data in Streaming Replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Inconsistent DB data in Streaming Replication
Date
Msg-id 004d01ce3b55$8ab72380$a0256a80$@kapila@huawei.com
Whole thread Raw
In response to Re: Inconsistent DB data in Streaming Replication  (Florian Pflug <fgp@phlo.org>)
Responses Re: Inconsistent DB data in Streaming Replication
List pgsql-hackers
On Monday, April 15, 2013 1:02 PM Florian Pflug wrote:
> On Apr14, 2013, at 17:56 , Fujii Masao <masao.fujii@gmail.com> wrote:
> > At fast shutdown, after walsender sends the checkpoint record and
> > closes the replication connection, walreceiver can detect the close
> > of connection before receiving all WAL records. This means that,
> > even if walsender sends all WAL records, walreceiver cannot always
> > receive all of them.
> 
> That sounds like a bug in walreceiver to me.
> 
> The following code in walreceiver's main loop looks suspicious:
> 
>   /*
>    * Process the received data, and any subsequent data we
>    * can read without blocking.
>    */
>   for (;;)
>   {
>     if (len > 0)
>     {
>       /* Something was received from master, so reset timeout */
>       ...
>       XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1);
>     }
>     else if (len == 0)
>       break;
>     else if (len < 0)
>     {
>       ereport(LOG,
>           (errmsg("replication terminated by primary server"),
>            errdetail("End of WAL reached on timeline %u at %X/%X",
>                  startpointTLI,
>                  (uint32) (LogstreamResult.Write >> 32),
>                  (uint32) LogstreamResult.Write)));
>       ...
>     }
>     len = walrcv_receive(0, &buf);
>   }
> 
>   /* Let the master know that we received some data. */
>   XLogWalRcvSendReply(false, false);
> 
>   /*
>    * If we've written some records, flush them to disk and
>    * let the startup process and primary server know about
>    * them.
>    */
>   XLogWalRcvFlush(false);
> 
> The loop at the top looks fine - it specifically avoids throwing
> an error on EOF. But the code then proceeds to XLogWalRcvSendReply()
> which doesn't seem to have the same smarts - it simply does
> 
>   if (PQputCopyData(streamConn, buffer, nbytes) <= 0 ||
>       PQflush(streamConn))
>       ereport(ERROR,
>               (errmsg("could not send data to WAL stream: %s",
>                       PQerrorMessage(streamConn))));
> 
> Unless I'm missing something, that certainly seems to explain
> how a standby can lag behind even after a controlled shutdown of
> the master.

Do you mean to say that as an error has occurred, so it would not be able to
flush received WAL, which could result in loss of WAL?
I think even if error occurs, it will call flush in WalRcvDie(), before
terminating WALReceiver.

With Regards,
Amit Kapila.




pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: event trigger API documentation?
Next
From: Florian Pflug
Date:
Subject: Re: Inconsistent DB data in Streaming Replication