Re: Bug in walreceiver - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Bug in walreceiver
Date
Msg-id 4D2EBEE3.2010709@enterprisedb.com
Whole thread Raw
In response to Bug in walreceiver  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Bug in walreceiver  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On 13.01.2011 10:28, Fujii Masao wrote:
> When the master shuts down or crashes, there seems to be
> the case where walreceiver exits without flushing WAL which
> has already been written. This might lead startup process to
> replay un-flushed WAL and break a Write-Ahead-Logging rule.

Hmm, that can happen at a crash even with no replication involved. If 
you "kill -9 postmaster", and some WAL had been written but not fsync'd, 
on crash recovery we will happily recover the unsynced WAL. We could 
prevent that by fsyncing all WAL before applying it - presumably 
fsyncing a file that has already been flushed is quick. But is it worth 
the trouble?

> walreceiver.c
>>         /* Wait a while for data to arrive */
>>         if (walrcv_receive(NAPTIME_PER_CYCLE,&type,&buf,&len))
>>         {
>>             /* Accept the received data, and process it */
>>             XLogWalRcvProcessMsg(type, buf, len);
>>
>>             /* Receive any more data we can without sleeping */
>>             while (walrcv_receive(0,&type,&buf,&len))
>>                 XLogWalRcvProcessMsg(type, buf, len);
>>
>>             /*
>>              * If we've written some records, flush them to disk and let the
>>              * startup process know about them.
>>              */
>>             XLogWalRcvFlush();
>>         }
>
> The problematic case happens when the latter walrcv_receive
> emits ERROR. In this case, the WAL received by the former
> walrcv_receive is not guaranteed to have been flushed yet.
>
> The attached patch ensures that all WAL received is flushed to
> disk before walreceiver exits. This patch should be backported
> to 9.0, I think.

Yeah, we probably should do that, even though it doesn't completely 
close the window tahat unsynced WAL is replayed.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Bug in walreceiver
Next
From: Joel Jacobson
Date:
Subject: Bug in pg_dump