Re: Proposal for 9.1: WAL streaming from WAL buffers - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Proposal for 9.1: WAL streaming from WAL buffers
Date
Msg-id AANLkTikXyvTatNV4-6gTP7VDsaH1Icege-SsYr051FNl@mail.gmail.com
Whole thread Raw
In response to Re: Proposal for 9.1: WAL streaming from WAL buffers  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Proposal for 9.1: WAL streaming from WAL buffers  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Re: Proposal for 9.1: WAL streaming from WAL buffers  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Wed, Jun 16, 2010 at 5:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jun 15, 2010 at 3:57 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> I wonder if it would be possible to jigger things so that we send the
>>> WAL to the standby as soon as it is generated, but somehow arrange
>>> things so that the standby knows the last location that the master has
>>> fsync'd and never applies beyond that point.
>>
>> I can't think of any way which would not require major engineering.  And
>> you'd be slowing down replication *in general* to deal with a fairly
>> unlikely corner case.
>>
>> I think the panic is the way to go.
>
> I have yet to convince myself of how likely this is to occur.  I tried
> to reproduce this issue by crashing the database, but I think in 9.0
> you need an actual operating system crash to cause this problem, and I
> haven't yet set up an environment in which I can repeatedly crash the
> OS.  I believe, though, that in 9.1, we're going to want to stream
> from WAL buffers as proposed in the patch that started out this
> thread, and then I think this issue can be triggered with just a
> database crash.
>
> In 9.0, I think we can fix this problem by (1) only streaming WAL that
> has been fsync'd and (2) PANIC-ing if the problem occurs anyway.  But
> in 9.1, with sync rep and the performance demands that entails, I
> think that we're going to need to rethink it.

The problem is not that the master streams non-fsync'd WAL, but that the
standby can replay that. So I'm thinking that we can send non-fsync'd WAL
safely if the standby makes the recovery wait until the master has fsync'd
WAL. That is, walsender sends not only non-fsync'd WAL but also WAL flush
location to walreceiver, and the standby applies only the WAL which the
master has already fsync'd. Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: beta3 & the open items list
Next
From: Thom Brown
Date:
Subject: Re: Using multidimensional indexes in ordinal queries