On Mon, Jun 14, 2010 at 4:14 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Jun 11, 2010 at 11:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I think the failover case might be OK. But if the master crashes and
>> restarts, the slave might be left thinking its xlog position is ahead
>> of the xlog position on the master.
>
> Right. Unless we perform a failover in this case, the standby might go down
> because of inconsistency of WAL after restarting the master. To avoid this
> problem, walsender must wait for WAL to be not only written but also *fsynced*
> on the master before sending it as 9.0 does. Though this would degrade the
> performance, this might be useful for some cases. We should provide the knob
> to specify whether to allow the standby to go ahead of the master or not?
Maybe. That sounds like a pretty enormous foot-gun to me, considering
that we have no way of recovering from the situation where the standby
gets ahead of the master. Right now, I believe we're still in the
situation where the standby goes into an infinite CPU-chewing,
log-spewing loop, but even after we fix that it's not going to be good
enough to really handle that case sensibly, which we probably need to
do if we want to make this change.
Come to think of it, can this happen already? Can the master stream
WAL to the standby after it's written but before it's fsync'd?
We should get the open item fixed for 9.0 here before we start
worrying about 9.1.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company