Re: Proposal for 9.1: WAL streaming from WAL buffers - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Proposal for 9.1: WAL streaming from WAL buffers
Date
Msg-id 4C1F3372.2090202@enterprisedb.com
Whole thread Raw
In response to Re: Proposal for 9.1: WAL streaming from WAL buffers  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Proposal for 9.1: WAL streaming from WAL buffers  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
On 21/06/10 12:08, Fujii Masao wrote:
> On Wed, Jun 16, 2010 at 5:06 AM, Robert Haas<robertmhaas@gmail.com>  wrote:
>> In 9.0, I think we can fix this problem by (1) only streaming WAL that
>> has been fsync'd and (2) PANIC-ing if the problem occurs anyway.  But
>> in 9.1, with sync rep and the performance demands that entails, I
>> think that we're going to need to rethink it.
>
> The problem is not that the master streams non-fsync'd WAL, but that the
> standby can replay that. So I'm thinking that we can send non-fsync'd WAL
> safely if the standby makes the recovery wait until the master has fsync'd
> WAL. That is, walsender sends not only non-fsync'd WAL but also WAL flush
> location to walreceiver, and the standby applies only the WAL which the
> master has already fsync'd. Thought?

I guess, but you have to be very careful to correctly refrain from 
applying the WAL. For example, a naive implementation might write the 
WAL to disk in walreceiver immediately, but refrain from telling the 
startup process about it. If walreceiver is then killed because the 
connection is broken (and it will be because the master just crashed), 
the startup process will read the streamed WAL from the file in pg_xlog, 
and go ahead to apply it anyway.

So maybe there's some room for optimization there, but given the 
round-trip required for the acknowledgment anyway it might not buy you 
much, and the implementation is not very straightforward. This is 
clearly 9.1 material, if worth optimizing at all.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: Re: Using multidimensional indexes in ordinal queries
Next
From: Robert Haas
Date:
Subject: Re: beta3 & the open items list