Re: synchronous_commit = apply - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: synchronous_commit = apply
Date
Msg-id CAEepm=2voGzoXFHszzvraXGxVxKMPu5g=ybasWyp_i+GikiX=Q@mail.gmail.com
Whole thread Raw
In response to Re: synchronous_commit = apply  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Thu, Sep 17, 2015 at 12:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 1 September 2015 at 20:25, Thomas Munro <thomas.munro@enterprisedb.com>
> wrote:
>> The next problem is that the master can be waiting quite a long time for a
>> reply from the remote walreceiver containing the desired apply LSN: in the
>> best case it learns of apply progress from replies to subsequent unrelated
>> records (which might be very soon on a busy system but still involves
>> waiting for the next transaction's WAL flush), and in the worst case it
>> needs to wait for wal_receiver_status_interval (10 seconds by default),
>> which makes for a long COMMIT delay.  I was thinking that the solution to
>> that may be to teach StartupLOG to signal the walreceiver after it updates
>> XLogCtl->lastReplayedEndRecPtr, which should cause walrcv_receive to be
>> interrupted and return early, and then walreceiver could send a reply if it
>> sees that lastReplayedEndRecPtr has moved.  Maybe that would generate an
>> unacceptably high frequency of signals, and maybe there is a better form of
>> IPC for this.  Without introducing any new IPC, the walreceiver could
>> instead simply report apply progress to the master whenever it sees that the
>> apply LSN has moved after its regular NAPTIME_PER_CYCLE wait (100ms), but
>> that would still introduces bogus latency.  A quick and dirty way to see
>> that on top of the attached patch is to set requestReply = true in
>> WalReceiverMain to force a send after every nap.
>
>
> This problem is exactly why I wrote my recent patch to make WALWriter work
> in recovery.
>
> Currently, the WALReceiver issues regular fsyncs that prevent it from
> replying in time. Also, the WALReceiver waits on incoming data only, so we
> can't (yet) set a latch when the Startup process has applied some records.
>
> I've solved the first problem and know how to solve the second, just haven't
> coded it yet. I was expecting to do that for CF3 or CF4.
>
> I don't think we should be using signals, nor would I expect them to work
> effectively while in an fsync.

That sounds much better.  I had noticed that with my patch the
walreceiver loop was basically trying to do far too much.  I was
contemplating investigating a pipe for IPC, so that it could
select/poll on both the socket connected to master + the new apply
feedback pipe, rather that using raw signals (directly or via latches)
and interrupting syscalls.

>> I can see that using synchronous_commit = apply in the practice might
>> prove difficult:  how does a client know which node is the synchronous
>> standby?  Perhaps those sorts of practical problems are the reason no one
>> has done or wanted this.
>
> It means we need quorum sync rep as well, to make this useful in practice
> without sacrificing HA.
>
> Bringing my patch and Beena's patch together will solve this for us in 9.6

I've been looking at that patch.  It makes sense for adding redundancy
in synchronous_commit = on mode (waiting for WAL flush but not apply).
But it strikes me that to make multi-server synchronous_commit = apply
really useful, it is not enough to wait for a quorum of any N servers
in a group to reply, because a client connected to a given standby
doesn't know whether that standby was one of the N and therefore
whether it is guaranteed to see the effects of a committed transaction
that it has heard about.  Do you have a plan that could address that?

I have been working on a proposal that adds support for reliable
"causal" and "ready-your-writes" consistency, while still allowing for
some number of standbys to fail/fall behind without blocking all
transactions forever.  After a COMMIT with synchronous_commit = apply
returns successfully, you can run a query on any standby node, or tell
another process to run a query on any standby node, and it is
guaranteed to either see the committed transaction or receive a new
error "standby not synchronized".  This behaviour is activated by also
setting synchronous_commit = apply on the standby, and works by adding
some two-way timeout logic.  I will have more to say about this soon
(I have some other work to get out of the way first).

I will not be at all surprised to hear that you already have this
covered and are 18 steps ahead of me!

> So yes, 1) we have thought of it and want it, 2) the basic patch is trivial,
> 3) but it isn't the main problem.

Agreed.  I had a go at this because I needed the trivial plumbing in
so I could work on the more difficult problem above, and I didn't know
you had it in the pipeline already.  I'm glad to hear that you do, and
that you have solved the problem of the interleaving of operations in
walreceiver, and I will be following along with interest.

-- 
Thomas Munro
http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Unicode mapping scripts cleanup
Next
From: Haribabu Kommi
Date:
Subject: Re: Parallel Seq Scan