Re: Synchronization levels in SR - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Synchronization levels in SR |
Date | |
Msg-id | 1274952655.6203.4051.camel@ebony Whole thread Raw |
In response to | Re: Synchronization levels in SR (Fujii Masao <masao.fujii@gmail.com>) |
Responses |
Re: Synchronization levels in SR
|
List | pgsql-hackers |
On Thu, 2010-05-27 at 16:35 +0900, Fujii Masao wrote: > On Thu, May 27, 2010 at 3:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Thu, 2010-05-27 at 11:28 +0900, Fujii Masao wrote: > >> On Wed, May 26, 2010 at 10:20 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> > On Wed, 2010-05-26 at 18:52 +0900, Fujii Masao wrote: > >> > > >> >> I guess that dropping the support of #3 doesn't reduce complexity > >> >> since the code of #3 is almost the same as that of #2. Like > >> >> walreceiver sends the ACK after receiving the WAL in #2 case, it has > >> >> only to do the same thing after the WAL flush. > >> > > >> > Hmm, well the code for #3 is similar also to the code for #4. So if you > >> > do #2, its easy to do #2, #3 and #4 together. > >> > >> No. #4 requires the way of prompt communication between walreceiver and > >> startup process, but #2 and #3 not. That is, in #4, walreceiver has to > >> wake the startup process up as soon as it has flushed WAL. OTOH, the > >> startup process has to wake walreceiver up as soon as it has replayed > >> WAL, to request it to send the ACK to the master. In #2 and #3, the > >> prompt communication from walreceiver to startup process, i.e., changing > >> the poll loop in the startup process would also be useful for the data > >> to be visible immediately on the standby. But it's not required. > > > > You need to pass WAL promptly on primary from backend to WALSender. > > Whatever mechanism you use can also be reused symmetrically on standby > > to provide #4. So not a problem. > > I cannot be so optimistic since the situation differs from one process > to another. This spurs some architectural thinking: I think we need to disconnect the idea of waiting in any of the components. Anytime we ask WALSender or WALReceiver to wait for acknowledgement we will be reducing throughput. So we should assume that they will continue to work as quickly as possible. The acknowledgement from standby can contain the latest xlog location of WAL received, WAL written to disk and WAL applied, all by reading values from shared memory. It's all the same, whether we send back 2 or 3 xlog locations in the ack message. Who sends the ack message? Who receives it? Would it be easier to have this happen in a second pair of processes WALSynchroniser (on primary) and WAL Acknowledger (on standby). WALAcknowledger would send back a stream of ack messages with latest xlog positions. WALSynchroniser would receive these messages and wake up sleeping backends. If we did that then there'd be almost no change at all to existing code, just additional code and processes for the sync case. Code would be separate and there would be no performance concerns either. Backends can then choose to wait until the xlog location they wish has been achieved which might be in the next acknowledgement message or in a subsequent one. That also ensures that the logic for this is completely on the master and the standby doesn't act differently, apart from needing to start a WALAcknowledger process if sync rep is requested. If you do choose to make #3 important, then I'd say you need to work out how to make WALWriter active as well, so it can perform regular fsyncs, rather than having WALReceiver wait across that I/O. -- Simon Riggs www.2ndQuadrant.com
pgsql-hackers by date: