Re: write ahead logging in standby (streaming replication) - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: write ahead logging in standby (streaming replication)
Date
Msg-id 1258018042.14054.103.camel@ebony
Whole thread Raw
In response to Re: write ahead logging in standby (streaming replication)  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: write ahead logging in standby (streaming replication)
List pgsql-hackers
On Thu, 2009-11-12 at 17:03 +0900, Fujii Masao wrote:

> On Thu, Nov 12, 2009 at 4:32 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> > Fujii Masao wrote:
> >> The problem is that fsync needs to be issued too frequently, which would
> >> be harmless in asynchronous replication, but not in synchronous one.
> >> A transaction would have to wait for the primary's and standby's fsync
> >> before returning a "success" to a client.
> >>
> >> So I'm inclined to change the startup process and bgwriter, instead of
> >> walreceiver, so as to fsync the WAL for the WAL rule.
> >
> > Let's keep it simple for now. Just make the walreceiver do the fsync. We
> > can optimize later. For now, we're only going to have async mode anyway.
> 
> Okey, I'll do that; the walreceiver issues the fsync for each arrival of
> the WAL records, and the startup process replays only the records already
> fsynced.

I agree with you, though it has taken some time to understand what you
said and at first my reaction was to disagree. I think the responses you
got on this are because you dived straight in with a question before
explaining other things around this.

We already have a number of options for how to handle incoming WAL. We
can choose to fsync or not when WAL arrives. Choosing *not* to fsync
would be the typical choice because it provides reasonable performance;
fsyncing after each transaction commit would be worse. In any case, if
WAL receiver does the fsyncs then we will get worse performance. If we
reduce the number of fsyncs it does we just get spiky behaviour around
the fsyncs.

If recovery starts reading WAL records that have not been fsynced then
we may need to flush a shared buffer to disk that depends upon a
non-fsynced(yet) WAL record. Fsyncing WAL after *every* WAL record is
going to make performance suck even worse and is completely out of the
question. So implementing the fsync-WAL-before-buffer-flush rule during
recovery makes much more sense. It's also only small change during
XlogFlush().

Another way of doing this would be to only allow recovery to progress as
far as has been fsynced. That seems a more plausible approach, but would
lead to delays if we had a small number of long write transactions. The
benefit of streaming is that it potentially allows us to keep as near to
real-time recovery as possible.

So overall, yes, we need to do as you suggested: implement WAL rule in
recovery. WALreceiver smoothly does write(), Startup replays and we
leave the WAL file fsyncs to be performed by the bgwriter. 

But I also agree with Heikki. Let's plan to do this later in this
release.

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: write ahead logging in standby (streaming replication)
Next
From: Simon Riggs
Date:
Subject: Re: New VACUUM FULL