Re: Inconsistent DB data in Streaming Replication - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Inconsistent DB data in Streaming Replication
Date
Msg-id CAHGQGwGSMWBaz0JL40mH7EJ3xtB6a5S3RmQ7YVSYA9cGBkEe1g@mail.gmail.com
Whole thread Raw
In response to Re: Inconsistent DB data in Streaming Replication  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Apr 17, 2013 at 10:11 PM, Amit Kapila <amit.kapila@huawei.com> wrote:
> On Wednesday, April 17, 2013 4:19 PM Florian Pflug wrote:
>> On Apr17, 2013, at 12:22 , Amit Kapila <amit.kapila@huawei.com> wrote:
>> > Do you mean to say that as an error has occurred, so it would not be
>> able to
>> > flush received WAL, which could result in loss of WAL?
>> > I think even if error occurs, it will call flush in WalRcvDie(),
>> before
>> > terminating WALReceiver.
>>
>> Hm, true, but for that to prevent the problem the inner processing
>> loop needs to always read up to EOF before it exits and we attempt
>> to send a reply. Which I don't think it necessarily does. Assume,
>> that the master sends a chunk of data, waits a bit, and finally
>> sends the shutdown record and exits. The slave might then receive
>> the first chunk, and it might trigger sending a reply. At the time
>> the reply is sent, the master has already sent the shutdown record
>> and closed the connection, and we'll thus fail to reply and abort.
>> Since the shutdown record has never been read from the socket,
>> XLogWalRcvFlush won't flush it, and the slave ends up behind the
>> master.
>>
>> Also, since XLogWalRcvProcessMsg responds to keep-alives messages,
>> we might also error out of the inner processing loop if the server
>> closes the socket after sending a keepalive but before we attempt
>> to respond.
>>
>> Fixing this on the receive side alone seems quite messy and fragile.
>> So instead, I think we should let the master send a shutdown message
>> after it has sent everything it wants to send, and wait for the client
>> to acknowledge it before shutting down the socket.
>>
>> If the client fails to respond, we could log a fat WARNING.
>
> Your explanation seems to be okay, but I think before discussing the exact
> solution,
> If the actual problem can be reproduced, then it might be better to discuss
> this solution.

I got this problem several times when I enabled WAL archiving and shut down
the master.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: 9.3 Beta1 status report
Next
From: Fujii Masao
Date:
Subject: Re: Inconsistent DB data in Streaming Replication