Re: synchronous_commit = apply - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: synchronous_commit = apply |
Date | |
Msg-id | CAEepm=2voGzoXFHszzvraXGxVxKMPu5g=ybasWyp_i+GikiX=Q@mail.gmail.com Whole thread Raw |
In response to | Re: synchronous_commit = apply (Simon Riggs <simon@2ndQuadrant.com>) |
List | pgsql-hackers |
On Thu, Sep 17, 2015 at 12:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 1 September 2015 at 20:25, Thomas Munro <thomas.munro@enterprisedb.com> > wrote: >> The next problem is that the master can be waiting quite a long time for a >> reply from the remote walreceiver containing the desired apply LSN: in the >> best case it learns of apply progress from replies to subsequent unrelated >> records (which might be very soon on a busy system but still involves >> waiting for the next transaction's WAL flush), and in the worst case it >> needs to wait for wal_receiver_status_interval (10 seconds by default), >> which makes for a long COMMIT delay. I was thinking that the solution to >> that may be to teach StartupLOG to signal the walreceiver after it updates >> XLogCtl->lastReplayedEndRecPtr, which should cause walrcv_receive to be >> interrupted and return early, and then walreceiver could send a reply if it >> sees that lastReplayedEndRecPtr has moved. Maybe that would generate an >> unacceptably high frequency of signals, and maybe there is a better form of >> IPC for this. Without introducing any new IPC, the walreceiver could >> instead simply report apply progress to the master whenever it sees that the >> apply LSN has moved after its regular NAPTIME_PER_CYCLE wait (100ms), but >> that would still introduces bogus latency. A quick and dirty way to see >> that on top of the attached patch is to set requestReply = true in >> WalReceiverMain to force a send after every nap. > > > This problem is exactly why I wrote my recent patch to make WALWriter work > in recovery. > > Currently, the WALReceiver issues regular fsyncs that prevent it from > replying in time. Also, the WALReceiver waits on incoming data only, so we > can't (yet) set a latch when the Startup process has applied some records. > > I've solved the first problem and know how to solve the second, just haven't > coded it yet. I was expecting to do that for CF3 or CF4. > > I don't think we should be using signals, nor would I expect them to work > effectively while in an fsync. That sounds much better. I had noticed that with my patch the walreceiver loop was basically trying to do far too much. I was contemplating investigating a pipe for IPC, so that it could select/poll on both the socket connected to master + the new apply feedback pipe, rather that using raw signals (directly or via latches) and interrupting syscalls. >> I can see that using synchronous_commit = apply in the practice might >> prove difficult: how does a client know which node is the synchronous >> standby? Perhaps those sorts of practical problems are the reason no one >> has done or wanted this. > > It means we need quorum sync rep as well, to make this useful in practice > without sacrificing HA. > > Bringing my patch and Beena's patch together will solve this for us in 9.6 I've been looking at that patch. It makes sense for adding redundancy in synchronous_commit = on mode (waiting for WAL flush but not apply). But it strikes me that to make multi-server synchronous_commit = apply really useful, it is not enough to wait for a quorum of any N servers in a group to reply, because a client connected to a given standby doesn't know whether that standby was one of the N and therefore whether it is guaranteed to see the effects of a committed transaction that it has heard about. Do you have a plan that could address that? I have been working on a proposal that adds support for reliable "causal" and "ready-your-writes" consistency, while still allowing for some number of standbys to fail/fall behind without blocking all transactions forever. After a COMMIT with synchronous_commit = apply returns successfully, you can run a query on any standby node, or tell another process to run a query on any standby node, and it is guaranteed to either see the committed transaction or receive a new error "standby not synchronized". This behaviour is activated by also setting synchronous_commit = apply on the standby, and works by adding some two-way timeout logic. I will have more to say about this soon (I have some other work to get out of the way first). I will not be at all surprised to hear that you already have this covered and are 18 steps ahead of me! > So yes, 1) we have thought of it and want it, 2) the basic patch is trivial, > 3) but it isn't the main problem. Agreed. I had a go at this because I needed the trivial plumbing in so I could work on the more difficult problem above, and I didn't know you had it in the pipeline already. I'm glad to hear that you do, and that you have solved the problem of the interleaving of operations in walreceiver, and I will be following along with interest. -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: