Re: Synchronization levels in SR - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Synchronization levels in SR
Date
Msg-id 1274952655.6203.4051.camel@ebony
Whole thread Raw
In response to Re: Synchronization levels in SR  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Synchronization levels in SR
List pgsql-hackers
On Thu, 2010-05-27 at 16:35 +0900, Fujii Masao wrote:
> On Thu, May 27, 2010 at 3:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Thu, 2010-05-27 at 11:28 +0900, Fujii Masao wrote:
> >> On Wed, May 26, 2010 at 10:20 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> > On Wed, 2010-05-26 at 18:52 +0900, Fujii Masao wrote:
> >> >
> >> >> I guess that dropping the support of #3 doesn't reduce complexity
> >> >> since the code of #3 is almost the same as that of #2. Like
> >> >> walreceiver sends the ACK after receiving the WAL in #2 case, it has
> >> >> only to do the same thing after the WAL flush.
> >> >
> >> > Hmm, well the code for #3 is similar also to the code for #4. So if you
> >> > do #2, its easy to do #2, #3 and #4 together.
> >>
> >> No. #4 requires the way of prompt communication between walreceiver and
> >> startup process, but #2 and #3 not. That is, in #4, walreceiver has to
> >> wake the startup process up as soon as it has flushed WAL. OTOH, the
> >> startup process has to wake walreceiver up as soon as it has replayed
> >> WAL, to request it to send the ACK to the master. In #2 and #3, the
> >> prompt communication from walreceiver to startup process, i.e., changing
> >> the poll loop in the startup process would also be useful for the data
> >> to be visible immediately on the standby. But it's not required.
> >
> > You need to pass WAL promptly on primary from backend to WALSender.
> > Whatever mechanism you use can also be reused symmetrically on standby
> > to provide #4. So not a problem.
> 
> I cannot be so optimistic since the situation differs from one process
> to another.

This spurs some architectural thinking:

I think we need to disconnect the idea of waiting in any of the
components. Anytime we ask WALSender or WALReceiver to wait for
acknowledgement we will be reducing throughput. So we should assume that
they will continue to work as quickly as possible.

The acknowledgement from standby can contain the latest xlog location of
WAL received, WAL written to disk and WAL applied, all by reading values
from shared memory. It's all the same, whether we send back 2 or 3 xlog
locations in the ack message.

Who sends the ack message? Who receives it? Would it be easier to have
this happen in a second pair of processes WALSynchroniser (on primary)
and WAL Acknowledger (on standby). WALAcknowledger would send back a
stream of ack messages with latest xlog positions. WALSynchroniser would
receive these messages and wake up sleeping backends. If we did that
then there'd be almost no change at all to existing code, just
additional code and processes for the sync case. Code would be separate
and there would be no performance concerns either.

Backends can then choose to wait until the xlog location they wish has
been achieved which might be in the next acknowledgement message or in a
subsequent one. That also ensures that the logic for this is completely
on the master and the standby doesn't act differently, apart from
needing to start a WALAcknowledger process if sync rep is requested.

If you do choose to make #3 important, then I'd say you need to work out
how to make WALWriter active as well, so it can perform regular fsyncs,
rather than having WALReceiver wait across that I/O.

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Synchronization levels in SR
Next
From: Dimitri Fontaine
Date:
Subject: Re: primary/secondary/master/slave/standby